Hi guys,

the problem of DBpedia entity types is not a new one. I have discussed some of 
the problems in the paper [1]. 

In my opinion it is pretty hard to establish the correct type of an entity 
using only one source of information. DBpedia uses Infoboxes (both infobox 
types and recently it performs type inference based on individual properties), 
YAGO uses categories while Tipalo uses first sentences. But each method has its 
inherent difficulties.
In our recent research we try to combine these methods in order to provide most 
accurate typing of DBpedia entites. The intermediate results are available at 
[2]. Still they are not perfect, but we are making progress. E.g. we do not 
have "Bachelor of Arts", but we have: "Bachelor of Arts in Applied Psychology" 
and similar, which are classified as EducationalDegree. We use Cyc and Umbel 
concepts as types of the entities. 

If anyone is interested in that research, I am eager to discuss it during the 
forthcoming ISWC in Italy.

Cheers,
Aleksander

[1] A. Pohl, Classifying the Wikipedia Articles into the OpenCyc Taxonomy [in:] 
Proceedings of the Web of Linked Entities Workshop in conjuction with the 11th 
International Semantic Web Conference, Giuseppe Rizzo, Pablo Mendes, Eric 
Charton, Sebastian Hellmann, Aditya Kalyanpurs (eds.), p. 5-16, ISSN: 
1613-0073. 

[2] http://klon.wzks.uj.edu.pl/wiki-types/


---- Wł. Pn, 13 paź 2014 17:51:07 +0200 Heiko 
Paulheim<[email protected]> napisał(a) ---- 


  Hi Valentina,
 
 I am not sure whether I understand you correctly. There might be cases of 
metonymy in DBpedia, but as far as I can see, Wikipedia is usually quite good 
at separating them via disambiguation pages, I am not sure whether there are 
too many example.
 
 The problem with the degrees, as far as I can tell, is not a metonymy one 
(degrees are just degrees, I have never seen them used to refer to a 
university), but simply a series of shortcomings in DBpedia. What happens here 
inside DBpedia is the following:
 * First, we find an infobox which says that someone's almaMater is, say, 
"Princeton University (B.A.)". Both Princeton and B.A. are linked to the 
respective Wikipedia pages.
 * The extraction framework extracts two statements from that: 
 PersonX almaMater Princeton_University, and
 PersonX almaMater Bachelor_of_Arts
 (the second one being an error, which is very hard to avoid in the general 
case)
 * Since that happens a few times, we infer that Bachelor_of_Arts is a 
University.
 
 So in that case, I think it's purely a DBpedia problem. If you are aware of 
any actual cases of metonymy, however, I am curious to hear about that.
 
 All the best,
 Heiko
 
 
 
 Am 13.10.2014 16:33, schrieb Valentina Presutti:
 
 [email protected]" type="cite"> Hi Heiko, 
 
 thanks for the prompt reply and the explanation.
 However, the interesting thing is that these entities are clearly used with 
more than one sense (at least in the US culture), so the issue comes from this 
fact originally in my opinion.
 I mentioned two cases here, but if you check you can see that all these types 
of entities (Degrees) have the same problem.
 
 
 My suggestion (if that can help) is to identify such metonym cases and have a 
special approach: having different entities as the number of senses.
 
 
 However, the Wikipedia page of such entities defines them as degrees…not sure 
if this can be useful to notice for you. 
 
 
 Valentina
 
  On 13 Oct 2014, at 09:03, Heiko Paulheim 
<[email protected]> wrote:
 
  Hi Valentina,
 
 (and CCing the DBpedia discussion list)
 
 this is an effect of the heuristic typing we employ in DBpedia [1]. It works 
correctly in many cases, and sometimes it fails - as for these examples (the 
classic tradeoff between coverage and precision). 
 
 To briefly explain how the error comes into existence: we look at the 
distribution of types that occur for the ingoing properties of an untyped 
instance. For dbpedia:Bachelor_of_Arts, there are, among others, 208 ingoing 
properties with the predicate dbpedia-owl:almaMater (which is already 
questionable). For that predicate, 87.6% of the objects are of type 
dbpedia-owl:University. So we have a strong pattern, with many supporting 
statements, and we conclude that dbpedia:Bachelor_of_Arts is a university. That 
mechanism, as I said, works reasonable well, but sometimes fails at single 
instances, like this one. For dbpedia:Academic_degree, you'll find similar 
questionable statements involving that instace, that mislead the heuristic 
typing algorithm.
 
 With the 2014 release, we further tried to reduce errors like these by 
filtering common nouns using WordNet before assigning types to instances, but 
both "Academic degree" and "Bachelor of Arts" escaped our nets here :-(
 
 The public DBpedia endpoint loads both the infobox based types and the 
heuristic types. If you need a "clean" version, I advise you to set up a local 
endpoint and load only the infobox based types into it.
 
 Best,
 Heiko
 
 [1] http://www.heikopaulheim.com/documents/iswc2013.pdf
 
 
 
 
 Am 13.10.2014 02:42, schrieb Valentina Presutti:
 
 [email protected]" type="cite">Dear all, 
 
 I noticed that dbpedia:Bachelor_of_Arts, as well as other similar entities 
(dbpedia:Bachelor_of_Engineering, dbpedia:Bachelor_of_Science, etc.), is typed 
as dbpedia-owl:University
 I would expect a type like “Academic Degree” but if you look at
 dbpedia:Academic_Degree, its type is again dbpedia-owl:University
 
 
 however, its definition is (according to dbpedia):
 
 
 "An academic degree is a college or university diploma, often associated with 
a title and sometimes associated with an academic position, which is usually 
awarded in recognition of the recipient having either satisfactorily completed 
a prescribed course of study or having conducted a scholarly endeavour deemed 
worthy of his or her admission to the degree. The most common degrees awarded 
today are associate, bachelor's, master's, and doctoral degrees.”
 
 
 Showing that there are at least two different meanings associated with the 
term: college/university and title.
 I thing that different meanings should be separated so as to allow 
applications to refer to the different entities: a university or a title.
 
 
 At least for me this causes errors in automatic relation extraction...
 
 
 Wdyt?
 
 
 Valentina
  
 -- Prof. Dr. Heiko Paulheim Data and Web Science Group University of Mannheim 
Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: 
[email protected] Web: http://www.heikopaulheim.com/  
  
 
 
  
 -- Prof. Dr. Heiko Paulheim Data and Web Science Group University of Mannheim 
Phone: +49 621 181 2646 B6, 26, Room C1.08 D-68159 Mannheim Mail: 
[email protected] Web: www.heikopaulheim.com  
------------------------------------------------------------------------------ 
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer 
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports 
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper 
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer 
http://p.sf.net/sfu/Zoho_______________________________________________ 
Dbpedia-discussion mailing list 
[email protected] 
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion 




Reply via email to