Yes, TAC-KBP is one of the good places to go for Entity Linking (Targeted Disambiguation). There is also a similar effort for co-reference resolution that has been running since the nineties -- see MUC-6 (1995), MUC-7 (1997), ACE NIST (2004). More recently, CoNLL shared tasks: http://aclweb.org/portal/content/conll-2012-shared-task http://conll.cemantix.org/2011/program.html
But if you go to ACL, EMNLP, NAACL, CoNLL and COLING proceedings and look for coreference, disambiguation, entity linking, etc. you will find a pile of papers to keep you entertained for weeks. :) Cheers, Pablo MUC-6. 1995. Coreference Task Definition. In Proceedings of the Sixth Message Understanding Conference (MUC-6). MUC-7. 1997. Coreference Task Definition. In Proceedings of the Seventh Message Understanding Conference (MUC-7). NIST. 2004. The ACE Evaluation Plan. NIST. On Mon, Sep 3, 2012 at 5:09 PM, Rafa Haro <rh...@zaizi.com> wrote: > Hi David, > > There are a lot of good papers regarding entity disambiguation. Like Pablo > says, co-reference resolution and cross-document co-reference resolution > are highly related fields and for sure you can find good surveys about it. > Entity Disambiguation against Knowledge Bases is, maybe, a more recent > research interest. Of course, one of the most KB used is Wikipedia. Check > out this link: > > http://www.nist.gov/tac/2012/**KBP/index.html<http://www.nist.gov/tac/2012/KBP/index.html> > > Knowledge Base Population is a special track to promote research in > automated systems that discover information about named entities as found > in a large corpus and incorporate this information into a knowledge base. > One of its tasks is Entity Linking that in summary consists on link a > mention of an entity with its correspondence KB entry. This year has been > the fourth edition of the competition. > > Although some works in KBP are very 'ad-hoc' for the datasets of the > competition, they usually refer to generic approach in the state of the art. > > Regards > > El 03/09/12 16:40, David Riccitelli escribió: > > Thanks Pablo, >> >> I find the disambiguation subject very challenging and intriguing. By any >> chance, >> do you (or anyone) have any pointers to some presentations, background >> documentation or lectures about disambiguation? >> >> I would like to keep this topic alive as disambiguation can truly make a >> difference >> in our implementations. >> >> Best regards, >> David >> >> On Mon, Sep 3, 2012 at 5:31 PM, Pablo N. Mendes <pablomen...@gmail.com >> >wrote: >> >> Hi David, >>> The challenge you described is usually referred to as "(in document) >>> coreference resolution". It is very related to the entity disambiguation >>> problem, as entity disambiguation can be seen as cross-document >>> coreference >>> resolution (by using identifiers from a pre-established KB). However I >>> think it's worth thinking of it separately from (but in close connection >>> with) the targeted entity disambiguation problem. This is because there >>> are >>> many alternatives with pros and cons, including: >>> 1. clustering mentions at recognition time and then disambiguating, >>> 2. clustering mentions after disambiguating, or >>> 3. jointly disambiguating/clustering. >>> >>> In DBpedia Spotlight we use a very simple heuristic rule: if a first name >>> or last name is spotted, we look backwards for a full name and assign >>> everyone in the chain to the same entity [1]. It is a very crude >>> assumption, but works quite well in practice. >>> >>> Cheers, >>> Pablo >>> >>> [1] >>> >>> https://github.com/dbpedia-**spotlight/dbpedia-spotlight/** >>> blob/master/core/src/main/**scala/org/dbpedia/spotlight/** >>> filter/annotations/**CoreferenceFilter.scala<https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala> >>> >>> On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io >>> >>>> wrote: >>>> Thanks Kritarth, >>>> >>>> Let me discuss another case, with another example: there's a text like >>>> >>> this >>> >>>> "Valentino Rossi won the MotoGP. Everybody loves Rossi.". >>>> >>>> Right now the enhancer correctly identifies "Valentino Rossi (racer)" in >>>> the TextAnnotation "Valentino Rossi", while makes different suggestions >>>> >>> for >>> >>>> the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino >>>> Rossi non being the first): >>>> - "Daniele De Rossi (soccer player)" >>>> - "Vasco Rossi (singer)" >>>> - "Valentino Rossi (racer)" >>>> >>>> In this case would the disambiguation engine boost the score of the >>>> EntityAnnotation "Valentino Rossi (racer)"? >>>> >>>> BR, >>>> David >>>> >>>> On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand < >>>> >>> kritarth.an...@gmail.com >>> >>>> wrote: >>>>> Hi David, >>>>> Thanks for your interest. >>>>> >>>>> What would a sentence like this yield, "Paris is not the city in United >>>>> States" ? >>>>> >>>>> It would yield Paris,Texas too. Well those are one the reasons the >>>>> >>>> problem >>>> >>>>> is very hard. >>>>> >>>>> Kritarth >>>>> >>>>> On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io >>>>> >>>>>> wrote: >>>>>> What would a sentence like this yield, "Paris is not the city in >>>>>> >>>>> United >>> >>>> States" ? >>>>>> >>>>>> On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand < >>>>>> >>>>> kritarth.an...@gmail.com >>>>> >>>>>> wrote: >>>>>>> Dear members of Stanbol community, >>>>>>> >>>>>>> I hereby would like to discuss about the next few iterations of the >>>>>>> Disambiguation Engine. The Disambiguation Engine, To Disambiguate >>>>>>> >>>>>> Engines >>>>> >>>>>> few versions of Engines have been prepared. I would like to briefly >>>>>>> describe them below. I hope to become a permanent committer for >>>>>>> >>>>>> Stanbol >>>> >>>>> if >>>>>> >>>>>>> my contribution is considered after this GSOC period. I will be >>>>>>> >>>>>> committing >>>>>> >>>>>>> the code versions soon. And applying patch to JIRA soon. >>>>>>> >>>>>>> 1. How disambiguation Engine problem was approached. >>>>>>> For certain text annotations there are might be many Entity >>>>>>> >>>>>> Annotations >>>>> >>>>>> mapped, It was required to rank them in the order of there >>>>>>> >>>>>> likelihood. >>>> >>>>> Paris is the a small city in the United States. >>>>>>> >>>>>>> a.The Paris is this sentence without disambiguation (using Dbpedia >>>>>>> >>>>>> as >>> >>>> vocabulary). There are three entity annotations mapped 1. Paris, >>>>>>> >>>>>> France , >>>>> >>>>>> 2. Paris, Texas 3. Paris, *Something* (The entity mapped with >>>>>>> >>>>>> highest >>> >>>> fise:confidence is Paris, France.) >>>>>>> b. Now how would disambiguation by humans take place. On reading >>>>>>> >>>>>> the >>> >>>> line >>>>> >>>>>> an individual thinks of the context the text is referring to. Doing >>>>>>> >>>>>> so >>>> >>>>> he >>>>> >>>>>> realizes that since the text talks about Paris and also about >>>>>>> >>>>>> United >>> >>>> States. The Paris mentioned here is More Like Paris,Texas(which is >>>>>>> >>>>>> in >>> >>>> United States) and therefore must refer to it. >>>>>>> c. The approach followed in implementation takes inspiration from >>>>>>> >>>>>> the >>> >>>> example and works in the following manner somewhat follows the >>>>>>> >>>>>> pseudo >>> >>>> code >>>>>> >>>>>>> below. >>>>>>> for( K: TextAnnotations) >>>>>>> { List EntityAnnotations =getEntityAnnotationsRelated(**K); >>>>>>> Context=GetContextInformation(**K); >>>>>>> >>>>>>> List Results=QueryMLTVocabularies(**K, Context); >>>>>>> updateConfidences(Result,**EntityAnnotations) >>>>>>> } >>>>>>> >>>>>>> d. My current approach to handle disambiguation involved a lot of >>>>>>> variations however for the purpose of simplicity I'll talk only >>>>>>> >>>>>> about >>> >>>> differences in obtaining "Context". >>>>>>> >>>>>>> 2. The Context Procurement: >>>>>>> a. All Entity Context: The context would be decided on by all the >>>>>>> textannotations of the text. It proves to show good results for >>>>>>> >>>>>> shorter >>>> >>>>> texts, but introduces lot of redundant annotations in longer ones >>>>>>> >>>>>> making >>>>> >>>>>> context less useful >>>>>>> b. All link Context: The context is decided on the basis of site or >>>>>>> reference link associated with the text annotations, which of >>>>>>> >>>>>> course >>> >>>> can >>>>> >>>>>> be >>>>>> >>>>>>> required to disambiguate. So it does not behave in a very good >>>>>>> >>>>>> fashion >>>> >>>>> c. Selection Context: The selection context is basically contains >>>>>>> >>>>>> text >>>> >>>>> one >>>>>> >>>>>>> sentence prior and after the current one. Also another version >>>>>>> >>>>>> worked >>> >>>> with >>>>>> >>>>>>> Text Annotations in this region of text. >>>>>>> d. Vicinity Entity Context: The vicinity annotation detection >>>>>>> >>>>>> measures >>>> >>>>> distance in the neighborhood of the text annotation. >>>>>>> >>>>>>> 3. Future >>>>>>> a. With a running POC of this Engine it can be used to create an >>>>>>> >>>>>> advanced >>>>> >>>>>> version like the Spotlight approach or using Markov Logic Networks >>>>>>> discussed earlier. >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> David Riccitelli >>>>>> >>>>>> >>>>>> >>>>>> **************************************************************** >>> ******************** >>> >>>> InsideOut10 s.r.l. >>>>>> P.IVA: IT-11381771002 >>>>>> Fax: +39 0110708239 >>>>>> --- >>>>>> LinkedIn: >>>>>> http://it.linkedin.com/in/**riccitelli<http://it.linkedin.com/in/riccitelli> >>>>>> Twitter: ziodave >>>>>> --- >>>>>> Layar Partner Network< >>>>>> >>>>>> http://www.layar.com/**publishing/developers/list/?** >>> page=1&country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> >>> >>>> >>>>>> **************************************************************** >>> ******************** >>> >>>> >>>> >>>> -- >>>> David Riccitelli >>>> >>>> >>>> >>>> **************************************************************** >>> ******************** >>> >>>> InsideOut10 s.r.l. >>>> P.IVA: IT-11381771002 >>>> Fax: +39 0110708239 >>>> --- >>>> LinkedIn: >>>> http://it.linkedin.com/in/**riccitelli<http://it.linkedin.com/in/riccitelli> >>>> Twitter: ziodave >>>> --- >>>> Layar Partner Network< >>>> >>>> http://www.layar.com/**publishing/developers/list/?** >>> page=1&country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1> >>> >>>> >>>> **************************************************************** >>> ******************** >>> >>> >>> -- >>> --- >>> Pablo N. Mendes >>> http://pablomendes.com >>> Events: http://wole2012.eurecom.fr >>> >>> >> >> > > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, > London W10 5JJ, UK. > -- --- Pablo N. Mendes http://pablomendes.com Events: http://wole2012.eurecom.fr