Re: Entity Disambiguation Engine

Rafa Haro Mon, 03 Sep 2012 08:10:59 -0700

Hi David,

There are a lot of good papers regarding entity disambiguation. LikePablo says, co-reference resolution and cross-document co-referenceresolution are highly related fields and for sure you can find goodsurveys about it. Entity Disambiguation against Knowledge Bases is,maybe, a more recent research interest. Of course, one of the most KBused is Wikipedia. Check out this link:


http://www.nist.gov/tac/2012/KBP/index.html

Knowledge Base Population is a special track to promote research inautomated systems that discover information about named entities asfound in a large corpus and incorporate this information into aknowledge base. One of its tasks is Entity Linking that in summaryconsists on link a mention of an entity with its correspondence KBentry. This year has been the fourth edition of the competition.

Although some works in KBP are very 'ad-hoc' for the datasets of thecompetition, they usually refer to generic approach in the state of the art.


Regards

El 03/09/12 16:40, David Riccitelli escribió:

Thanks Pablo,

I find the disambiguation subject very challenging and intriguing. By any
chance,
do you (or anyone) have any pointers to some presentations, background
documentation or lectures about disambiguation?

I would like to keep this topic alive as disambiguation can truly make a
difference
in our implementations.

Best regards,
David

On Mon, Sep 3, 2012 at 5:31 PM, Pablo N. Mendes <pablomen...@gmail.com>wrote:

Hi David,
The challenge you described is usually referred to as "(in document)
coreference resolution". It is very related to the entity disambiguation
problem, as entity disambiguation can be seen as cross-document coreference
resolution (by using identifiers from a pre-established KB). However I
think it's worth thinking of it separately from (but in close connection
with) the targeted entity disambiguation problem. This is because there are
many alternatives with pros and cons, including:
1. clustering mentions at recognition time and then disambiguating,
2. clustering mentions after disambiguating, or
3. jointly disambiguating/clustering.

In DBpedia Spotlight we use a very simple heuristic rule: if a first name
or last name is spotted, we look backwards for a full name and assign
everyone in the chain to the same entity [1]. It is a very crude
assumption, but works quite well in practice.

Cheers,
Pablo

[1]

https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala

On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io

wrote:
Thanks Kritarth,

Let me discuss another case, with another example: there's a text like

this

"Valentino Rossi won the MotoGP. Everybody loves Rossi.".

Right now the enhancer correctly identifies "Valentino Rossi (racer)" in
the TextAnnotation "Valentino Rossi", while makes different suggestions

for

the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino
Rossi non being the first):
  - "Daniele De Rossi (soccer player)"
  - "Vasco Rossi (singer)"
  - "Valentino Rossi (racer)"

In this case would the disambiguation engine boost the score of the
EntityAnnotation "Valentino Rossi (racer)"?

BR,
David

On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand <

kritarth.an...@gmail.com

wrote:
Hi David,
Thanks for your interest.

What would a sentence like this yield, "Paris is not the city in United
States" ?

It would yield Paris,Texas too. Well those are one the reasons the

problem

is very hard.

Kritarth

On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io

wrote:
What would a sentence like this yield, "Paris is not the city in

United

States" ?

On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand <

kritarth.an...@gmail.com

wrote:
Dear members of Stanbol community,

I hereby would like to discuss about the next few iterations of the
Disambiguation Engine. The Disambiguation Engine, To Disambiguate

Engines

few versions of Engines have been prepared. I would like to briefly
describe them below. I hope to become a permanent committer for

Stanbol

if

my contribution is considered after this GSOC period. I will be

committing

the code versions soon. And applying patch to JIRA soon.

1. How disambiguation Engine problem was approached.
  For certain text annotations there are might be many Entity

Annotations

mapped, It was required to rank them in the order of there

likelihood.

Paris is the a small city in the United States.

a.The Paris is this sentence without disambiguation (using Dbpedia

as

vocabulary). There are three entity annotations mapped 1. Paris,

France ,

2. Paris, Texas 3. Paris, *Something* (The entity mapped with

highest

fise:confidence is Paris, France.)
b. Now how would disambiguation by humans take place. On reading

the

line

an individual thinks of the context the text is referring to. Doing

so

he

realizes that since the text talks about Paris and also about

United

States. The Paris mentioned here is More Like Paris,Texas(which is

in

United States) and therefore must refer to it.
c. The approach followed in implementation takes inspiration from

the

example and works in the following manner somewhat follows the

pseudo

code

below.
     for( K: TextAnnotations)
     {    List EntityAnnotations =getEntityAnnotationsRelated(K);
         Context=GetContextInformation(K);

         List Results=QueryMLTVocabularies(K, Context);
         updateConfidences(Result,EntityAnnotations)
     }

d. My current approach to handle disambiguation involved a lot of
variations however for the purpose of simplicity I'll talk only

about

differences in obtaining "Context".

2. The Context Procurement:
a. All Entity Context: The context would be decided on by all the
textannotations of the text. It proves to show good results for

shorter

texts, but introduces lot of redundant annotations in longer ones

making

context less useful
b. All link Context: The context is decided on the basis of site or
reference link associated with the text annotations, which of

course

can

be

required to disambiguate. So it does not behave in a very good

fashion

c. Selection Context: The selection context is basically contains

text

one

sentence prior and after the current one. Also another version

worked

with

Text Annotations in this region of text.
d. Vicinity Entity Context: The vicinity annotation detection

measures

distance in the neighborhood of the text annotation.

3. Future
a. With a running POC of this Engine it can be used to create an

advanced

version like the Spotlight approach or using Markov Logic Networks
discussed earlier.



--
David Riccitelli

********************************************************************************

InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<

http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1

********************************************************************************



--
David Riccitelli

********************************************************************************

InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<

http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1

********************************************************************************


--
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr



This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.

Re: Entity Disambiguation Engine

Reply via email to