Re: [Wikidata] Data model explanation and protection

Benjamin Good Wed, 28 Oct 2015 11:04:35 -0700

Yes, I think the problem of maintaining a multi-class data model within
wikidata is a general problem.  You could imagine similar scenarios in any
domain.


Our particular gene/protein merge problem is specific to our work.  It is
not just one user (Fullerene) though, this has been happening for a while
and many have participated.  See e.g. the post here:
https://www.wikidata.org/wiki/User_talk:Andrawaag#ProteinBoxBot_Mistake.3F
and here:
https://www.wikidata.org/wiki/User_talk:DGtal#Merging_items

On Wed, Oct 28, 2015 at 10:47 AM, Finn Årup Nielsen <[email protected]> wrote:

> Do you think it is a general problem? The few merges that I checked was
> all done by Fullerene and s/he has now responded after Andrawaag made a
> note on the talk page https://www.wikidata.org/wiki/User_talk:Fullerene
>
>
> /Finn
>
>
> On 10/28/2015 06:07 PM, Benjamin Good wrote:
>
>> The Gene Wiki team is experiencing a problem that may suggest some areas
>> for improvement in the general wikidata experience.
>>
>> When our project was getting started, we had some fairly long public
>> debates about how we should structure the data we wanted to load [1].
>> These resulted in a data model that, we think, remains pretty much true
>> to the semantics of the data, at the cost of distributing information
>> about closely related things (genes, proteins, orthologs) across
>> multiple, interlinked items.  Now, as long as these semantic links
>> between the different item classes are maintained, this is working out
>> great.  However, we are consistently seeing people merging items that
>> our model needs to be distinct.  Most commonly, we see people merging
>> items about genes with items about the protein product of the gene (e.g.
>> [2]]).  This happens nearly every day - especially on items related to
>> the more popular Wikipedia articles. (More examples [3])
>>
>> Merges like this, as well as other semantics-breaking edits, make it
>> very challenging to build downstream apps (like the wikipedia infobox)
>> that depend on having certain structures in place.  My question to the
>> list is how to best protect the semantic models that span multiple
>> entity types in wikidata?  Related to this, is there an opportunity for
>> some consistent way of explaining these structures to the community when
>> they exist?
>>
>> I guess the immediate solutions are to (1) write another bot that
>> watches for model-breaking edits and reverts them and (2) to create an
>> article on wikidata somewhere that succinctly explains the model and
>> links back to the discussions that went into its creation.
>>
>> It seems that anyone that works beyond a single entity type is going to
>> face the same kind of problems, so I'm posting this here in hopes that
>> generalizable patterns (and perhaps even supporting code) can be
>> realized by this community.
>>
>> [1]
>>
>> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
>> [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
>> [3]
>>
>> https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt
>>
>>
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Data model explanation and protection

Reply via email to