Yes, I think the problem of maintaining a multi-class data model within wikidata is a general problem. You could imagine similar scenarios in any domain.
Our particular gene/protein merge problem is specific to our work. It is not just one user (Fullerene) though, this has been happening for a while and many have participated. See e.g. the post here: https://www.wikidata.org/wiki/User_talk:Andrawaag#ProteinBoxBot_Mistake.3F and here: https://www.wikidata.org/wiki/User_talk:DGtal#Merging_items On Wed, Oct 28, 2015 at 10:47 AM, Finn Årup Nielsen <[email protected]> wrote: > Do you think it is a general problem? The few merges that I checked was > all done by Fullerene and s/he has now responded after Andrawaag made a > note on the talk page https://www.wikidata.org/wiki/User_talk:Fullerene > > > /Finn > > > On 10/28/2015 06:07 PM, Benjamin Good wrote: > >> The Gene Wiki team is experiencing a problem that may suggest some areas >> for improvement in the general wikidata experience. >> >> When our project was getting started, we had some fairly long public >> debates about how we should structure the data we wanted to load [1]. >> These resulted in a data model that, we think, remains pretty much true >> to the semantics of the data, at the cost of distributing information >> about closely related things (genes, proteins, orthologs) across >> multiple, interlinked items. Now, as long as these semantic links >> between the different item classes are maintained, this is working out >> great. However, we are consistently seeing people merging items that >> our model needs to be distinct. Most commonly, we see people merging >> items about genes with items about the protein product of the gene (e.g. >> [2]]). This happens nearly every day - especially on items related to >> the more popular Wikipedia articles. (More examples [3]) >> >> Merges like this, as well as other semantics-breaking edits, make it >> very challenging to build downstream apps (like the wikipedia infobox) >> that depend on having certain structures in place. My question to the >> list is how to best protect the semantic models that span multiple >> entity types in wikidata? Related to this, is there an opportunity for >> some consistent way of explaining these structures to the community when >> they exist? >> >> I guess the immediate solutions are to (1) write another bot that >> watches for model-breaking edits and reverts them and (2) to create an >> article on wikidata somewhere that succinctly explains the model and >> links back to the discussions that went into its creation. >> >> It seems that anyone that works beyond a single entity type is going to >> face the same kind of problems, so I'm posting this here in hopes that >> generalizable patterns (and perhaps even supporting code) can be >> realized by this community. >> >> [1] >> >> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins >> [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370 >> [3] >> >> https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt >> >> >> _______________________________________________ >> Wikidata mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> >> > > -- > Finn Årup Nielsen > http://people.compute.dtu.dk/faan/ > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
