Hello, Very interesting idea. Just to feed the discussion, here is a very recent literature survey on data quality in Wikidata: https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf https://opensym.org/wp-content/uploads/2019/08/os19-paper-A17-piscopo.pdf
Cheers, Ettore Rizza On Sat, 24 Aug 2019 at 13:55, Uwe Jung <[email protected]> wrote: > Hello, > > As the importance of Wikidata increases, so do the demands on the quality > of the data. I would like to put the following proposal up for discussion. > > Two basic ideas: > > 1. Each Wikidata page (item) is scored after each editing. This score > should express different dimensions of data quality in a quickly manageable > way. > 2. A property is created via which the item refers to the score value. > Certain qualifiers can be used for a more detailed description (e.g. time > of calculation, algorithm used to calculate the score value, etc.). > > > The score value can be calculated either within Wikibase after each data > change or "externally" by a bot. For the calculation can be used among > other things: Number of constraints, completeness of references, degree of > completeness in relation to the underlying ontology, etc. There are already > some interesting discussions on the question of data quality which can be > used here ( see https://www.wikidata.org/wiki/Wikidata:Item_quality; > https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc). > > Advantages > > - Users get a quick overview of the quality of a page (item). > - SPARQL can be used to query only those items that meet a certain > quality level. > - The idea would probably be relatively easy to implement. > > > Disadvantage: > > - In a way, the data model is abused by generating statements that no > longer describe the item itself, but make statements about the > representation of this item in Wikidata. > - Additional computing power must be provided for the regular > calculation of all changed items. > - Only the quality of pages is referred to. If it is insufficient, the > changes still have to be made manually. > > > I would now be interested in the following: > > 1. Is this idea suitable to effectively help solve existing quality > problems? > 2. Which quality dimensions should the score value represent? > 3. Which quality dimension can be calculated with reasonable effort? > 4. How to calculate and represent them? > 5. Which is the most suitable way to further discuss and implement > this idea? > > > Many thanks in advance. > > Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>) > www.archivfuehrer-kolonialzeit.de/thesaurus > > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
