Hi David, The challenge you described is usually referred to as "(in document) coreference resolution". It is very related to the entity disambiguation problem, as entity disambiguation can be seen as cross-document coreference resolution (by using identifiers from a pre-established KB). However I think it's worth thinking of it separately from (but in close connection with) the targeted entity disambiguation problem. This is because there are many alternatives with pros and cons, including: 1. clustering mentions at recognition time and then disambiguating, 2. clustering mentions after disambiguating, or 3. jointly disambiguating/clustering.
In DBpedia Spotlight we use a very simple heuristic rule: if a first name or last name is spotted, we look backwards for a full name and assign everyone in the chain to the same entity [1]. It is a very crude assumption, but works quite well in practice. Cheers, Pablo [1] https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io>wrote: > Thanks Kritarth, > > Let me discuss another case, with another example: there's a text like this > "Valentino Rossi won the MotoGP. Everybody loves Rossi.". > > Right now the enhancer correctly identifies "Valentino Rossi (racer)" in > the TextAnnotation "Valentino Rossi", while makes different suggestions for > the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino > Rossi non being the first): > - "Daniele De Rossi (soccer player)" > - "Vasco Rossi (singer)" > - "Valentino Rossi (racer)" > > In this case would the disambiguation engine boost the score of the > EntityAnnotation "Valentino Rossi (racer)"? > > BR, > David > > On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand <kritarth.an...@gmail.com > >wrote: > > > Hi David, > > Thanks for your interest. > > > > What would a sentence like this yield, "Paris is not the city in United > > States" ? > > > > It would yield Paris,Texas too. Well those are one the reasons the > problem > > is very hard. > > > > Kritarth > > > > On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io > > >wrote: > > > > > What would a sentence like this yield, "Paris is not the city in United > > > States" ? > > > > > > On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand < > > kritarth.an...@gmail.com > > > >wrote: > > > > > > > Dear members of Stanbol community, > > > > > > > > I hereby would like to discuss about the next few iterations of the > > > > Disambiguation Engine. The Disambiguation Engine, To Disambiguate > > Engines > > > > few versions of Engines have been prepared. I would like to briefly > > > > describe them below. I hope to become a permanent committer for > Stanbol > > > if > > > > my contribution is considered after this GSOC period. I will be > > > committing > > > > the code versions soon. And applying patch to JIRA soon. > > > > > > > > 1. How disambiguation Engine problem was approached. > > > > For certain text annotations there are might be many Entity > > Annotations > > > > mapped, It was required to rank them in the order of there > likelihood. > > > > Paris is the a small city in the United States. > > > > > > > > a.The Paris is this sentence without disambiguation (using Dbpedia as > > > > vocabulary). There are three entity annotations mapped 1. Paris, > > France , > > > > 2. Paris, Texas 3. Paris, *Something* (The entity mapped with highest > > > > fise:confidence is Paris, France.) > > > > b. Now how would disambiguation by humans take place. On reading the > > line > > > > an individual thinks of the context the text is referring to. Doing > so > > he > > > > realizes that since the text talks about Paris and also about United > > > > States. The Paris mentioned here is More Like Paris,Texas(which is in > > > > United States) and therefore must refer to it. > > > > c. The approach followed in implementation takes inspiration from the > > > > example and works in the following manner somewhat follows the pseudo > > > code > > > > below. > > > > for( K: TextAnnotations) > > > > { List EntityAnnotations =getEntityAnnotationsRelated(K); > > > > Context=GetContextInformation(K); > > > > > > > > List Results=QueryMLTVocabularies(K, Context); > > > > updateConfidences(Result,EntityAnnotations) > > > > } > > > > > > > > d. My current approach to handle disambiguation involved a lot of > > > > variations however for the purpose of simplicity I'll talk only about > > > > differences in obtaining "Context". > > > > > > > > 2. The Context Procurement: > > > > a. All Entity Context: The context would be decided on by all the > > > > textannotations of the text. It proves to show good results for > shorter > > > > texts, but introduces lot of redundant annotations in longer ones > > making > > > > context less useful > > > > b. All link Context: The context is decided on the basis of site or > > > > reference link associated with the text annotations, which of course > > can > > > be > > > > required to disambiguate. So it does not behave in a very good > fashion > > > > c. Selection Context: The selection context is basically contains > text > > > one > > > > sentence prior and after the current one. Also another version worked > > > with > > > > Text Annotations in this region of text. > > > > d. Vicinity Entity Context: The vicinity annotation detection > measures > > > > distance in the neighborhood of the text annotation. > > > > > > > > 3. Future > > > > a. With a running POC of this Engine it can be used to create an > > advanced > > > > version like the Spotlight approach or using Markov Logic Networks > > > > discussed earlier. > > > > > > > > > > > > > > > > -- > > > David Riccitelli > > > > > > > > > > > > ******************************************************************************** > > > InsideOut10 s.r.l. > > > P.IVA: IT-11381771002 > > > Fax: +39 0110708239 > > > --- > > > LinkedIn: http://it.linkedin.com/in/riccitelli > > > Twitter: ziodave > > > --- > > > Layar Partner Network< > > > > > > http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 > > > > > > > > > > > > > ******************************************************************************** > > > > > > > > > -- > David Riccitelli > > > ******************************************************************************** > InsideOut10 s.r.l. > P.IVA: IT-11381771002 > Fax: +39 0110708239 > --- > LinkedIn: http://it.linkedin.com/in/riccitelli > Twitter: ziodave > --- > Layar Partner Network< > http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1 > > > > ******************************************************************************** > -- --- Pablo N. Mendes http://pablomendes.com Events: http://wole2012.eurecom.fr