The statement "ethanol *instance of* chemical compound" is ontologically
incorrect.  Importantly, it is also incompatible with ChEBI, the most
widely-used chemistry ontology.

The matter of how to apply *instance of* (P31, rdf:type) and *subclass of*
(P279, rdfs:subClassOf) on Wikidata in relation to chemical entities has
been, as Thomas puts it, a long discussion [1-5].  Hopefully with a wider
audience and experts like Markus Krötzsch and Denny Vrandečić now
interested, we can come to a resolution at least in the particular domain
of chemical compounds.  Since it concerns interoperability with another
large Semantic Web project, I have copied Janna Hastings and Alan
Ruttenberg on this discussion.  Janna coordinates ChEBI.  Alan coordinates
BFO, the upper ontology used by ChEBI and many other major ontologies in
the natural sciences, like Gene Ontology and Disease Ontology.

Denny indicates how the statement "Porsche 356 *instance of *car" would be
incorrect in Wikidata even though "Porsche 356 *is a* car" is acceptable in
everyday speech.  Similarly, "ethanol *instance of* chemical compound" is
incorrect in Wikidata even though "ethanol *is a* chemical compound" is
acceptable in less formal contexts.

A key difference between talk about cars and talk about chemicals is that,
with cars, we have familiar terms like "car model" that distinguish
concrete instances (that *particular* car you see on the street) from
abstract "instances" (i.e. metaclasses, classes that are also instances,
the *kind* of car that you see on the street).  We do not have a well-known
term like "chemical model" or "chemical compound type" to distinguish
classes (types) of chemicals and instances (tokens) of chemicals.  When one
speaks of the properties of ethanol or hydrogen, it is understood that the
subject is *all concrete, particular, spatiotemporal tokens, i.e. instances
*of ethanol and hydrogen -- not just a specific ethanol molecule floating
in that container before you on a Saturday with friends, but all molecules
that we label "ethanol" everywhere.

Thus, in order to formally classify ethanol itself as opposed to some
particular ethanol molecule, we must say for an item like
http://www.wikidata.org/wiki/Q153: "ethanol *subclass of* chemical
compound" and not "ethanol *instance of* chemical compound".  (On Wikidata,
the statement is more precisely "ethanol *subclass of *alcohol", but it is
entailed from the statements "alcohol *subclass of* organic compound" and
"organic compound *subclass of* chemical compound" that "ethanol *subclass
of* chemical compound".)

A common defense of statements like "ethanol *instance of* chemical
compound" is that Wikidata will never have items about any concrete
molecules of ethanol, so, since ethanol is a "leaf node" in our concept
taxonomy, it makes sense to state that ethanol is an instance.  That
interpretation of "instance" is short-sighted.  It precludes us from ever
talking about particular tokens of ethanol, or particular aggregates of
such objects, without overhauling our chemistry ontology.  Excluding
consideration of metaclasses like "chemical compound type", the fact that
an entity is a leaf node in a concept hierarchy is a necessary but not
sufficient condition for using *instance of*.

Another common suggestion is that we should state something like
"ethanol *instance
of* chemical compound type" and "ethanol *subclass of* chemical compound".

To see where that gets us, try wrapping your head around this:
https://commons.wikimedia.org/wiki/File:Atom_classes.svg.  Really, take a
look.  If we want Wikidata's concept hierarchy to be seen as of dauntingly
complex, pervasively applying that kind of three-layer classification
scheme will do.

The kind of explicit metamodeling seen when punning things like cars and
car models, ships and ship classes, biological taxa and organisms, etc.
works reasonably well in certain domains.  But, while we hold that hammer
in one hand, we should be careful not to see everything as a nail.  Outside
domains that have established vocabulary for metaclasses, imposing explicit
metamodeling with statements like "ethanol *instance of* chemical compound
type" or "hydrogen *instance of* atom type" will strike users as unduly
complex.

Without such metamodeling, though, querying for a list of chemical
compounds becomes murkier.  Surely we would want to return "ethanol" and
not "organic compound" in such a list.  How about "alcohol"?  Relatedly, if
we don't state "oxygen *instance of *chemical element", then how can we
easily query for all the elements in the Periodic Table of Elements without
including in the results of any potential subclasses of oxygen (e.g.,
isotopes of oxygen like oxygen-16, oxygen-17, etc.)?

There are ways to achieve that in SPARQL using rdfs:subClassOf / P279
/ *subclass
of*, but they require adhering to certain conventions.  When faced with
requiring many potential query users to learn some Wikidata MetaObject
Protocol, though, I'm inclined to make some sacrifices for simplicity,
ontological correctness, and consistency with major existing ontologies.

In summary, this ball has punted for over a year now.  Because of the
impasse in how to classify chemical entities, we now have showcase items
that have obvious problems like entailing that something is both a class
and an instance of chemical compound.   We need input from a wider group of
people knowledgeable about ontology or chemistry, ideally both.  Hopefully
with a Wikimedian in Residence at the Royal Society of Chemistry [6] we'll
get some more focused resources on this.  All major scientific ontologies
use *subclass of* (rdfs:subClassOf), not *instance of* (rdf:type), to
classify such things.  In my opinion, Wikidata should maintain technical
and philosophical compatibility with ontologies like ChEBI and remove
statements like "ethanol *instance of* chemical compound".  This would
improve interoperability between Wikidata and the rest of the Semantic Web.

Thanks,
Eric

https://www.wikidata.org/wiki/User:Emw

1.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Forth_and_back_conversions_of_items_between_class_and_instance
2.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/05#chemical_element
.
3.
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Chemistry#Germanium_subclass_tree.

4.
https://www.wikidata.org/wiki/Wikidata:Project_chat/Archive/2014/07#Subclass_of_two_different_things
5.
https://www.wikidata.org/wiki/Help_talk:Basic_membership_properties#Proposition_of_definition
6.
http://pigsonthewing.org.uk/wikimedian-residence-royal-society-chemistry/
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to