Hi Markus,

You asked "who is creating all these [subclass of] statements and how is
this done?"

The class hierarchy in
http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q35120&rp=279&lang=enshows
a few relatively large subclass trees for specialist domains,
including molecular biology and mineralogy.  The several thousand subclass
of 'gene' and 'protein' subclass claims were created by members of
WikiProject Molecular biology (WD:MB), based on discussions in [1] and
[2].  The decision to use P279 instead of P31 there was based on the fact
that the "is-a" relation in Gene Ontology maps to rdfs:subClassOf, which
P279 is based on.  The claims were added by a bot [3], with input from
WD:MB members.  The data ultimately comes from external biological
databases.

A glance at the mineralogy class hierarchy indicates it has been
constructed by WikiProject Mineralogy [4] members through non-bot edits.  I
imagine most of the other subclass of claims are done manually or
semi-automatically outside specific Wikiproject efforts.  In other words, I
think most of the other P279 claims are added by Wikidata users going into
the UI and building usually-reasonable concept hierarchies on domains
they're interested in.  I've worked on constructing class hierarchies for
health problems (e.g. diseases and injuries) [5] and medical procedures [6]
based on classifications like ICD-10 and assertions and templates on
Wikipedia (e.g. [8]).

It's not incredibly surprising to me that Wikidata has about 36,000
subclass of (P279) claims [9].  The property has been around for over a
year and is a regular topic of discussion [10] along with instance of
(P31), which has over 6,600,000 claims.

You noted a dubious claim subclass of claim for 'House of Staufen'
(Q130875).  I agree that instance of would probably be the better
membership property to use there.  Such questionable usage of P279 is
probably uncommon, but definitely not singular.  The dynasty class
hierarchy shows 13 dubious cases at the moment [11].  I would guess less
than 5% of subclass of claims have that kind of issue, where instance of
would make more sense.  I think there are probably vastly more cases of the
converse: instance of being used where subclass of would make more sense.

As you probably know, P31 and P279 are intended to have the semantics of
rdf:type and rdfs:subClassOf per community decision.  A while ago I read a
bit about the ELK reasoner you were involved with [12], which makes use of
the seemingly class-centric OWL EL profile.  Do you have any plans to
integrate features of ELK with the Wikidata Toolkit [13]?  How do you see
reasoning engines using P31 and P279 in the future, if at all?

Thanks,
Eric

https://www.wikidata.org/wiki/User:Emw

[1]
https://www.wikidata.org/wiki/WT:MB#Distinguishing_between_genes_and_proteins
[2] https://www.wikidata.org/wiki/WT:MB#Human.2Fmouse.2F..._ID
[3] https://www.wikidata.org/wiki/User:ProteinBoxBot.  Chinmay Nalk (
https://www.wikidata.org/wiki/User:Chinmay26) did all the work on this,
with input from WD:MB.
[4] https://www.wikidata.org/wiki/Wikidata:WikiProject_Mineralogy
[5]
http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q15281399&rp=279&lang=en
[6]
http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q796194&rp=279&lang=en
[7] http://apps.who.int/classifications/icd10/browse/2010/en
[8] https://en.wikipedia.org/wiki/Template:Surgeries
[9]
https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Popular_properties&oldid=125595374
[10] Examples include
- https://www.wikidata.org/wiki/Wikidata:Project_chat#chemical_element
-
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2013/12#Top_of_the_subclass_tree
-
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/01#Question_about_classes.2C_and_.27instance_of.27_vs_.27subclass.27
[11]
http://tools.wmflabs.org/wikidata-todo/tree.html?q=Q164950&rp=279&lang=en
[12] http://korrekt.org/page/The_Incredible_ELK
[13] https://www.mediawiki.org/wiki/Wikidata_Toolkit


On Mon, May 5, 2014 at 12:46 PM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> Hi,
>
> I got interested in subclass of (P279) and instance of (P31) statements
> recently. I was surprised by two things:
>
> (1) There are quite a lot of subclass of statements: tenth of thousands.
> (2) Many of them make a lot of sense, and (in particular) are not
> (obvious) copies of Wikipedia categories.
>
> My big question is: who is creating all these statements and how is this
> done? It seems too much data to be created manually, but I don't see
> obvious automated approaches either (and there are usually no references
> given).
>
> I also found some rare issues. "A subclass of B" should be read as "Every
> A is also a B". For example, we have "Every piano (Q5994) is also a
> keyboard instrument (Q52954)". Overall, the great majority of cases I
> looked at had remarkably sane modelling (which reinforces my big question).
>
> But there are still cases where "subclass of" is mixed up with "instance
> of". For example, Wikidata also says "Every 'House of Staufen' (Q130875) is
> also a dynasty (Q164950)". This is dubious -- how many instances of 'House
> of Staufen' are there? I guess we really want to say that "The House of
> Staufen is a(n instance of) dynasty." Is this a singular error or a
> systematic issue?
>
> I guess there is already a group of people who deal with such issues -- or
> it would be a miracle that things are in such a good shape already :-) I
> have read the talk page for subclass of, but that does not seem to explain
> the original of all the data we have already. Pointers?
>
> Cheers,
>
> Markus
>
>
> _______________________________________________
> Wikidata-l mailing list
> Wikidata-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to