Hi David,
There are a lot of good papers regarding entity disambiguation. Like
Pablo says, co-reference resolution and cross-document co-reference
resolution are highly related fields and for sure you can find good
surveys about it. Entity Disambiguation against Knowledge Bases is,
maybe, a more recent research interest. Of course, one of the most KB
used is Wikipedia. Check out this link:
http://www.nist.gov/tac/2012/KBP/index.html
Knowledge Base Population is a special track to promote research in
automated systems that discover information about named entities as
found in a large corpus and incorporate this information into a
knowledge base. One of its tasks is Entity Linking that in summary
consists on link a mention of an entity with its correspondence KB
entry. This year has been the fourth edition of the competition.
Although some works in KBP are very 'ad-hoc' for the datasets of the
competition, they usually refer to generic approach in the state of the art.
Regards
El 03/09/12 16:40, David Riccitelli escribió:
Thanks Pablo,
I find the disambiguation subject very challenging and intriguing. By any
chance,
do you (or anyone) have any pointers to some presentations, background
documentation or lectures about disambiguation?
I would like to keep this topic alive as disambiguation can truly make a
difference
in our implementations.
Best regards,
David
On Mon, Sep 3, 2012 at 5:31 PM, Pablo N. Mendes <pablomen...@gmail.com>wrote:
Hi David,
The challenge you described is usually referred to as "(in document)
coreference resolution". It is very related to the entity disambiguation
problem, as entity disambiguation can be seen as cross-document coreference
resolution (by using identifiers from a pre-established KB). However I
think it's worth thinking of it separately from (but in close connection
with) the targeted entity disambiguation problem. This is because there are
many alternatives with pros and cons, including:
1. clustering mentions at recognition time and then disambiguating,
2. clustering mentions after disambiguating, or
3. jointly disambiguating/clustering.
In DBpedia Spotlight we use a very simple heuristic rule: if a first name
or last name is spotted, we look backwards for a full name and assign
everyone in the chain to the same entity [1]. It is a very crude
assumption, but works quite well in practice.
Cheers,
Pablo
[1]
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/filter/annotations/CoreferenceFilter.scala
On Thu, Aug 23, 2012 at 5:27 PM, David Riccitelli <da...@insideout.io
wrote:
Thanks Kritarth,
Let me discuss another case, with another example: there's a text like
this
"Valentino Rossi won the MotoGP. Everybody loves Rossi.".
Right now the enhancer correctly identifies "Valentino Rossi (racer)" in
the TextAnnotation "Valentino Rossi", while makes different suggestions
for
the TextAnnotation "Rossi" , sorted by ranking (unfortunately Valentino
Rossi non being the first):
- "Daniele De Rossi (soccer player)"
- "Vasco Rossi (singer)"
- "Valentino Rossi (racer)"
In this case would the disambiguation engine boost the score of the
EntityAnnotation "Valentino Rossi (racer)"?
BR,
David
On Thu, Aug 23, 2012 at 4:43 PM, kritarth anand <
kritarth.an...@gmail.com
wrote:
Hi David,
Thanks for your interest.
What would a sentence like this yield, "Paris is not the city in United
States" ?
It would yield Paris,Texas too. Well those are one the reasons the
problem
is very hard.
Kritarth
On Thu, Aug 23, 2012 at 7:06 PM, David Riccitelli <da...@insideout.io
wrote:
What would a sentence like this yield, "Paris is not the city in
United
States" ?
On Thu, Aug 23, 2012 at 4:23 PM, kritarth anand <
kritarth.an...@gmail.com
wrote:
Dear members of Stanbol community,
I hereby would like to discuss about the next few iterations of the
Disambiguation Engine. The Disambiguation Engine, To Disambiguate
Engines
few versions of Engines have been prepared. I would like to briefly
describe them below. I hope to become a permanent committer for
Stanbol
if
my contribution is considered after this GSOC period. I will be
committing
the code versions soon. And applying patch to JIRA soon.
1. How disambiguation Engine problem was approached.
For certain text annotations there are might be many Entity
Annotations
mapped, It was required to rank them in the order of there
likelihood.
Paris is the a small city in the United States.
a.The Paris is this sentence without disambiguation (using Dbpedia
as
vocabulary). There are three entity annotations mapped 1. Paris,
France ,
2. Paris, Texas 3. Paris, *Something* (The entity mapped with
highest
fise:confidence is Paris, France.)
b. Now how would disambiguation by humans take place. On reading
the
line
an individual thinks of the context the text is referring to. Doing
so
he
realizes that since the text talks about Paris and also about
United
States. The Paris mentioned here is More Like Paris,Texas(which is
in
United States) and therefore must refer to it.
c. The approach followed in implementation takes inspiration from
the
example and works in the following manner somewhat follows the
pseudo
code
below.
for( K: TextAnnotations)
{ List EntityAnnotations =getEntityAnnotationsRelated(K);
Context=GetContextInformation(K);
List Results=QueryMLTVocabularies(K, Context);
updateConfidences(Result,EntityAnnotations)
}
d. My current approach to handle disambiguation involved a lot of
variations however for the purpose of simplicity I'll talk only
about
differences in obtaining "Context".
2. The Context Procurement:
a. All Entity Context: The context would be decided on by all the
textannotations of the text. It proves to show good results for
shorter
texts, but introduces lot of redundant annotations in longer ones
making
context less useful
b. All link Context: The context is decided on the basis of site or
reference link associated with the text annotations, which of
course
can
be
required to disambiguate. So it does not behave in a very good
fashion
c. Selection Context: The selection context is basically contains
text
one
sentence prior and after the current one. Also another version
worked
with
Text Annotations in this region of text.
d. Vicinity Entity Context: The vicinity annotation detection
measures
distance in the neighborhood of the text annotation.
3. Future
a. With a running POC of this Engine it can be used to create an
advanced
version like the Spotlight approach or using Markov Logic Networks
discussed earlier.
--
David Riccitelli
********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<
http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
********************************************************************************
--
David Riccitelli
********************************************************************************
InsideOut10 s.r.l.
P.IVA: IT-11381771002
Fax: +39 0110708239
---
LinkedIn: http://it.linkedin.com/in/riccitelli
Twitter: ziodave
---
Layar Partner Network<
http://www.layar.com/publishing/developers/list/?page=1&country=&city=&keyword=insideout10&lpn=1
********************************************************************************
--
---
Pablo N. Mendes
http://pablomendes.com
Events: http://wole2012.eurecom.fr
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.