Hi John,

I agree with you, but DBTax is general-purpose, not domain-specific.
Cheers!

On 9/18/15 1:25 AM, John Flynn wrote:
I guess this is a "point-of-view" comment, but attempting to assign "correct" 
types to entities seems upside-down. An ontology, consisting of specific classes, subclasses, 
properties, subproperties plus the specific relationships between these should describe a specific 
domain of interest. Once the domain of interest ontology is created, then the process of 
identifying and assigning entities/instances that belong within that domain of interest can begin. 
If the ontology is properly designed it should be very clear which entities fit within that domain 
of interest as well as where they fit.

John Flynn
http://semanticsimulations.com

-----Original Message-----
From: Marco Fossati [mailto:[email protected]]
Sent: Thursday, September 17, 2015 11:26 AM
To: Magnus Knuth
Cc: [email protected]; dbpedia-discussion
Subject: Re: [Dbpedia-discussion] DBtax questions

Hi Magnus and thanks for your interest,

Generally speaking, the challenge of assigning "correct" types to entities is 
always a highly subjective task.
  From a strictly linguistic point of view, a classification taxonomy is itself 
a very debatable way to describe the semantics of content expressed in natural 
language: one should always keep in mind contextual pieces of information to 
deeply understand the sense of e.g., some Wikipedia article.

Said that, the main goal of DBTax is to assign as many types as possible, 
provided that they are different from owl#Thing.
In this way, we can cluster entities with more meaningful types and query the 
knowledge base accordingly.

Of course, you can say that owl#Thing has 100% coverage, but does it make sense?
The claimed 99% stems instead from a *set* of more specific types.
Then high recall comes with a precision cost.

On 9/17/15 4:04 PM, Magnus Knuth wrote:
One structural problem I recognized when seeing the approach 
[http://jens-lehmann.org/files/2015/semantics_dbtax.pdf], is that there is in 
most (non-complex) categories an article having exactly the same name, e.g. 
dbr:President dc:subject dbc:President. And indeed these resources are typed 
accordingly, e.g. http://it.dbpedia.org/resource/Presidente is a 
dbtax:President and http://it.dbpedia.org/resource/Pagoda is dbtax:Pagoda.
That is obvious for a human, but is it the same for an algorithm? :-)

A type coverage of more than 99 percent is very suspicious, because I’d expect much more 
resources in DBpedia not type-able. Why? A lot of articles in DBpedia describe very 
abstract concepts, e.g. Liberty, Nationality, Social_inequality (well, you have the class 
dbtax:Concept, but what is on the other hand not a concept?), or they describe classes by 
their selves, e.g. President, Country, Person, Plane (well, you have the class 
dbtax:Classification, but it is not used as such 
[http://it.dbpedia.org/sparql?default-graph-uri=&query=SELECT+*+%7B%3Fres+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fdbtax%2FClassification%3E%7D&format=text%2Fhtml&debug=on]).
 For some articles it is arguable whether they are instance or class, e.g. Volkswagen_Polo, 
Horse.

I see that the classes you extracted are truly valuable for enriching the 
DBpedia ontology, but it obviously needs some tidy up and disambiguate efforts.
I completely agree: I think we should merge DBTax into the DBpedia ontology 
mappings wiki to do so.
BTW, DBTax overlaps with the DBpedia ontology by more than 20%.

Cheers!



--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

Reply via email to