Hi Magnus and thanks for your interest,
Generally speaking, the challenge of assigning "correct" types to
entities is always a highly subjective task.
From a strictly linguistic point of view, a classification taxonomy is
itself a very debatable way to describe the semantics of content
expressed in natural language: one should always keep in mind contextual
pieces of information to deeply understand the sense of e.g., some
Wikipedia article.
Said that, the main goal of DBTax is to assign as many types as
possible, provided that they are different from owl#Thing.
In this way, we can cluster entities with more meaningful types and
query the knowledge base accordingly.
Of course, you can say that owl#Thing has 100% coverage, but does it
make sense?
The claimed 99% stems instead from a *set* of more specific types.
Then high recall comes with a precision cost.
On 9/17/15 4:04 PM, Magnus Knuth wrote:
One structural problem I recognized when seeing the approach
[http://jens-lehmann.org/files/2015/semantics_dbtax.pdf], is that there is in
most (non-complex) categories an article having exactly the same name, e.g.
dbr:President dc:subject dbc:President. And indeed these resources are typed
accordingly, e.g. http://it.dbpedia.org/resource/Presidente is a
dbtax:President and http://it.dbpedia.org/resource/Pagoda is dbtax:Pagoda.
That is obvious for a human, but is it the same for an algorithm? :-)
A type coverage of more than 99 percent is very suspicious, because I’d expect much more
resources in DBpedia not type-able. Why? A lot of articles in DBpedia describe very
abstract concepts, e.g. Liberty, Nationality, Social_inequality (well, you have the class
dbtax:Concept, but what is on the other hand not a concept?), or they describe classes by
their selves, e.g. President, Country, Person, Plane (well, you have the class
dbtax:Classification, but it is not used as such
[http://it.dbpedia.org/sparql?default-graph-uri=&query=SELECT+*+%7B%3Fres+a+%3Chttp%3A%2F%2Fdbpedia.org%2Fdbtax%2FClassification%3E%7D&format=text%2Fhtml&debug=on]).
For some articles it is arguable whether they are instance or class, e.g. Volkswagen_Polo,
Horse.
I see that the classes you extracted are truly valuable for enriching the
DBpedia ontology, but it obviously needs some tidy up and disambiguate efforts.
I completely agree: I think we should merge DBTax into the DBpedia
ontology mappings wiki to do so.
BTW, DBTax overlaps with the DBpedia ontology by more than 20%.
Cheers!
Best regards and thanks for this work.
Magnus
Am 07.09.2015 um 15:33 schrieb Marco Fossati <[email protected]>:
Hi Vladimir and thanks for the feedback!
On 9/7/15 2:57 PM, Vladimir Alexiev wrote:
Hi Marco!
A couple of questions about DBtax:
- could you publish the ontology in a resolvable way?
E.g. http://it.dbpedia.org/resource/Bitcoin/html has rdf:type
http://dbpedia.org/dbtax/Term but that doesn’t resolve.
Sure, it's in the list of things to be done.
- are there definitions of the classes?
Since DBTax is an automatic approach, it is difficult to automatically
generate human-readable definitions.
I believe this can be crowdsourced to the DBpedia mappings wiki, what fo
you think?
E.g. Bitcoin has these classes, but I think those marked with !! are wrong
and those with ?? are hard to tell before we have a definition for them
dbtax: Page !!
dbtax: Article !!
dbtax: Protocol
dbtax: Communication
dbtax: Term ??
dbtax: Payment
Also looking in http://it.dbpedia.org/downloads/dbtax/T-Box.ttl, it's hard to
tell what these mean:
dbtax:Center rdfs:label "Center"@en ; rdfs:subClassOf dbtax:Art .
dbtax:Pagoda rdfs:label "Pagoda"@en ; rdfs:subClassOf dbtax:Art .
dbtax:Term rdfs:label "Term"@en ; rdfs:subClassOf dbtax:Art .
Cheers!
Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j
------------------------------------------------------------------------------
_______________________________________________
Dbpedia-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j