Re: Entity Disambiguation Engine

Pablo N. Mendes Mon, 03 Sep 2012 08:22:39 -0700

Yes, TAC-KBP is one of the good places to go for Entity Linking (Targeted
Disambiguation). There is also a similar effort for co-reference resolution
that has been running since the nineties -- see MUC-6 (1995), MUC-7 (1997),
ACE NIST (2004). More recently, CoNLL shared tasks:
http://aclweb.org/portal/content/conll-2012-shared-task
http://conll.cemantix.org/2011/program.html


But if you go to ACL, EMNLP, NAACL, CoNLL and COLING proceedings and look
for coreference, disambiguation, entity linking, etc. you will find a pile
of papers to keep you entertained for weeks. :)

Cheers,
Pablo

MUC-6. 1995. Coreference Task Definition. In Proceedings of the Sixth
Message Understanding Conference (MUC-6).
MUC-7. 1997. Coreference Task Definition. In Proceedings of the Seventh
Message Understanding Conference (MUC-7).
NIST. 2004. The ACE Evaluation Plan. NIST.


On Mon, Sep 3, 2012 at 5:09 PM, Rafa Haro <rh...@zaizi.com> wrote:

> Hi David,
>
> There are a lot of good papers regarding entity disambiguation. Like Pablo
> says, co-reference resolution and cross-document co-reference resolution
> are highly related fields and for sure you can find good surveys about it.
> Entity Disambiguation against Knowledge Bases is, maybe, a more recent
> research interest. Of course, one of the most KB used is Wikipedia. Check
> out this link:
>
> http://www.nist.gov/tac/2012/**KBP/index.html<http://www.nist.gov/tac/2012/KBP/index.html>
>
> Knowledge Base Population is a special track to promote research in
> automated systems that discover information about named entities as found
> in a large corpus and incorporate this information into a knowledge base.
> One of its tasks is Entity Linking that in summary consists on link a
> mention of an entity with its correspondence KB entry. This year has been
> the fourth edition of the competition.
>
> Although some works in KBP are very 'ad-hoc' for the datasets of the
> competition, they usually refer to generic approach in the state of the art.
>
> Regards
>
> El 03/09/12 16:40, David Riccitelli escribió:
>
>  Thanks Pablo,
>>
>> I find the disambiguation subject very challenging and intriguing. By any
>> chance,
>> do you (or anyone) have any pointers to some presentations, background
>> documentation or lectures about disambiguation?
>>
>> I would like to keep this topic alive as disambiguation can truly make a
>> difference
>> in our implementations.
>>
>> Best regards,
>> David
>>
>> On Mon, Sep 3, 2012 at 5:31 PM, Pablo N. Mendes <pablomen...@gmail.com
>> >wrote:
>>
>>  Hi David,
>>> The challenge you described is usually referred to as "(in document)
>>> coreference resolution". It is very related to the entity disambiguation
>>> problem, as entity disambiguation can be seen as cross-document
>>> coreference
>>> resolution (by using identifiers from a pre-established KB). However I
>>> think it's worth thinking of it separately from (but in close connection
>>> with) the targeted entity disambiguation problem. This is because there
>>> are
>>> many alternatives with pros and cons, including:
>>> 1. clustering mentions at recognition time and then disambiguating,
>>> 2. clustering mentions after disambiguating, or
>>> 3. jointly disambiguating/clustering.
>>>
>>> In DBpedia Spotlight we use a very simple heuristic rule: if a first name
>>> or last name is spotted, we look backwards for a full name and assign
>>> everyone in the chain to the same entity [1]. It is a very crude
>>> assumption, but works quite well in practice.
>>>
>>> Cheers,
>>> Pablo
>>>
>>> [1]
>>>
>>> https://github.com/dbpedia-**spotlight/dbpedia-spotlight/**
>>> blob/master/core/src/main/**scala/org/dbpedia/spotlight/**
>>> filter/annotations/**CoreferenceFilter.scala<https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala>
>>>
>>> On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io
>>>
>>>> wrote:
>>>> Thanks Kritarth,
>>>>
>>>> Let me discuss another case, with another example: there's a text like
>>>>
>>> this
>>>
>>>> "Valentino Rossi won the MotoGP. Everybody loves Rossi.".
>>>>
>>>> Right now the enhancer correctly identifies "Valentino Rossi (racer)" in
>>>> the TextAnnotation "Valentino Rossi", while makes different suggestions
>>>>
>>> for
>>>
>>>> the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino
>>>> Rossi non being the first):
>>>>   - "Daniele De Rossi (soccer player)"
>>>>   - "Vasco Rossi (singer)"
>>>>   - "Valentino Rossi (racer)"
>>>>
>>>> In this case would the disambiguation engine boost the score of the
>>>> EntityAnnotation "Valentino Rossi (racer)"?
>>>>
>>>> BR,
>>>> David
>>>>
>>>> On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand <
>>>>
>>> kritarth.an...@gmail.com
>>>
>>>> wrote:
>>>>> Hi David,
>>>>> Thanks for your interest.
>>>>>
>>>>> What would a sentence like this yield, "Paris is not the city in United
>>>>> States" ?
>>>>>
>>>>> It would yield Paris,Texas too. Well those are one the reasons the
>>>>>
>>>> problem
>>>>
>>>>> is very hard.
>>>>>
>>>>> Kritarth
>>>>>
>>>>> On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io
>>>>>
>>>>>> wrote:
>>>>>> What would a sentence like this yield, "Paris is not the city in
>>>>>>
>>>>> United
>>>
>>>> States" ?
>>>>>>
>>>>>> On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand <
>>>>>>
>>>>> kritarth.an...@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>> Dear members of Stanbol community,
>>>>>>>
>>>>>>> I hereby would like to discuss about the next few iterations of the
>>>>>>> Disambiguation Engine. The Disambiguation Engine, To Disambiguate
>>>>>>>
>>>>>> Engines
>>>>>
>>>>>> few versions of Engines have been prepared. I would like to briefly
>>>>>>> describe them below. I hope to become a permanent committer for
>>>>>>>
>>>>>> Stanbol
>>>>
>>>>> if
>>>>>>
>>>>>>> my contribution is considered after this GSOC period. I will be
>>>>>>>
>>>>>> committing
>>>>>>
>>>>>>> the code versions soon. And applying patch to JIRA soon.
>>>>>>>
>>>>>>> 1. How disambiguation Engine problem was approached.
>>>>>>>   For certain text annotations there are might be many Entity
>>>>>>>
>>>>>> Annotations
>>>>>
>>>>>> mapped, It was required to rank them in the order of there
>>>>>>>
>>>>>> likelihood.
>>>>
>>>>> Paris is the a small city in the United States.
>>>>>>>
>>>>>>> a.The Paris is this sentence without disambiguation (using Dbpedia
>>>>>>>
>>>>>> as
>>>
>>>> vocabulary). There are three entity annotations mapped 1. Paris,
>>>>>>>
>>>>>> France ,
>>>>>
>>>>>> 2. Paris, Texas 3. Paris, *Something* (The entity mapped with
>>>>>>>
>>>>>> highest
>>>
>>>> fise:confidence is Paris, France.)
>>>>>>> b. Now how would disambiguation by humans take place. On reading
>>>>>>>
>>>>>> the
>>>
>>>> line
>>>>>
>>>>>> an individual thinks of the context the text is referring to. Doing
>>>>>>>
>>>>>> so
>>>>
>>>>> he
>>>>>
>>>>>> realizes that since the text talks about Paris and also about
>>>>>>>
>>>>>> United
>>>
>>>> States. The Paris mentioned here is More Like Paris,Texas(which is
>>>>>>>
>>>>>> in
>>>
>>>> United States) and therefore must refer to it.
>>>>>>> c. The approach followed in implementation takes inspiration from
>>>>>>>
>>>>>> the
>>>
>>>> example and works in the following manner somewhat follows the
>>>>>>>
>>>>>> pseudo
>>>
>>>> code
>>>>>>
>>>>>>> below.
>>>>>>>      for( K: TextAnnotations)
>>>>>>>      {    List EntityAnnotations =getEntityAnnotationsRelated(**K);
>>>>>>>          Context=GetContextInformation(**K);
>>>>>>>
>>>>>>>          List Results=QueryMLTVocabularies(**K, Context);
>>>>>>>          updateConfidences(Result,**EntityAnnotations)
>>>>>>>      }
>>>>>>>
>>>>>>> d. My current approach to handle disambiguation involved a lot of
>>>>>>> variations however for the purpose of simplicity I'll talk only
>>>>>>>
>>>>>> about
>>>
>>>> differences in obtaining "Context".
>>>>>>>
>>>>>>> 2. The Context Procurement:
>>>>>>> a. All Entity Context: The context would be decided on by all the
>>>>>>> textannotations of the text. It proves to show good results for
>>>>>>>
>>>>>> shorter
>>>>
>>>>> texts, but introduces lot of redundant annotations in longer ones
>>>>>>>
>>>>>> making
>>>>>
>>>>>> context less useful
>>>>>>> b. All link Context: The context is decided on the basis of site or
>>>>>>> reference link associated with the text annotations, which of
>>>>>>>
>>>>>> course
>>>
>>>> can
>>>>>
>>>>>> be
>>>>>>
>>>>>>> required to disambiguate. So it does not behave in a very good
>>>>>>>
>>>>>> fashion
>>>>
>>>>> c. Selection Context: The selection context is basically contains
>>>>>>>
>>>>>> text
>>>>
>>>>> one
>>>>>>
>>>>>>> sentence prior and after the current one. Also another version
>>>>>>>
>>>>>> worked
>>>
>>>> with
>>>>>>
>>>>>>> Text Annotations in this region of text.
>>>>>>> d. Vicinity Entity Context: The vicinity annotation detection
>>>>>>>
>>>>>> measures
>>>>
>>>>> distance in the neighborhood of the text annotation.
>>>>>>>
>>>>>>> 3. Future
>>>>>>> a. With a running POC of this Engine it can be used to create an
>>>>>>>
>>>>>> advanced
>>>>>
>>>>>> version like the Spotlight approach or using Markov Logic Networks
>>>>>>> discussed earlier.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> David Riccitelli
>>>>>>
>>>>>>
>>>>>>
>>>>>>  ****************************************************************
>>> ********************
>>>
>>>> InsideOut10 s.r.l.
>>>>>> P.IVA: IT-11381771002
>>>>>> Fax: +39 0110708239
>>>>>> ---
>>>>>> LinkedIn: 
>>>>>> http://it.linkedin.com/in/**riccitelli<http://it.linkedin.com/in/riccitelli>
>>>>>> Twitter: ziodave
>>>>>> ---
>>>>>> Layar Partner Network<
>>>>>>
>>>>>>  http://www.layar.com/**publishing/developers/list/?**
>>> page=1&country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>>
>>>>
>>>>>>  ****************************************************************
>>> ********************
>>>
>>>>
>>>>
>>>> --
>>>> David Riccitelli
>>>>
>>>>
>>>>
>>>>  ****************************************************************
>>> ********************
>>>
>>>> InsideOut10 s.r.l.
>>>> P.IVA: IT-11381771002
>>>> Fax: +39 0110708239
>>>> ---
>>>> LinkedIn: 
>>>> http://it.linkedin.com/in/**riccitelli<http://it.linkedin.com/in/riccitelli>
>>>> Twitter: ziodave
>>>> ---
>>>> Layar Partner Network<
>>>>
>>>>  http://www.layar.com/**publishing/developers/list/?**
>>> page=1&country=&city=&keyword=**insideout10&lpn=1<http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1>
>>>
>>>>
>>>>  ****************************************************************
>>> ********************
>>>
>>>
>>> --
>>> ---
>>> Pablo N. Mendes
>>> http://pablomendes.com
>>> Events: http://wole2012.eurecom.fr
>>>
>>>
>>
>>
>
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
> London W10 5JJ, UK.
>



-- 
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr

Re: Entity Disambiguation Engine

Reply via email to