TLDR: it would be useful ; but extreme hard to create rules for every
domains.
>4. How to calculate and represent them?
imho: it is deepends of the data domain.
For geodata ( human settlements/rivers/mountains/... ) ( with GPS
coordinates ) my simple rules:
- if it has a "local wikipedia pages" or any big
lang["EN/FR/PT/ES/RU/.."] wikipedia page .. than it is OK.
- if it is only in "cebuano" AND outside of "cebuano BBOX" -> then ....
this is lower quality
- only:{shwiki+srwiki} AND outside of "sh"&"sr" BBOX -> this is lower
quality
- only {huwiki} AND outside of CentralEuropeBBOX -> this is lower quality
- geodata without GPS coordinate -> ...
- ....
so my rules based on wikipedia pages and languages areas ... and I prefer
wikidata - with local wikipedia pages.
This is based on my experience - adding Wikidata ID concordances to
NaturalEarth ( https://www.naturalearthdata.com/blog/ )
>5. Which is the most suitable way to further discuss and implement this
idea?
imho: Loading the wikidata dump to the local database;
and creating
- some "proof of concept" quality data indicators.
- some "meta" rules
- some "real" statistics
so the community can decide it is useful or not.
Imre
Uwe Jung <[email protected]> ezt írta (időpont: 2019. aug. 24., Szo,
14:55):
> Hello,
>
> As the importance of Wikidata increases, so do the demands on the quality
> of the data. I would like to put the following proposal up for discussion.
>
> Two basic ideas:
>
> 1. Each Wikidata page (item) is scored after each editing. This score
> should express different dimensions of data quality in a quickly manageable
> way.
> 2. A property is created via which the item refers to the score value.
> Certain qualifiers can be used for a more detailed description (e.g. time
> of calculation, algorithm used to calculate the score value, etc.).
>
>
> The score value can be calculated either within Wikibase after each data
> change or "externally" by a bot. For the calculation can be used among
> other things: Number of constraints, completeness of references, degree of
> completeness in relation to the underlying ontology, etc. There are already
> some interesting discussions on the question of data quality which can be
> used here ( see https://www.wikidata.org/wiki/Wikidata:Item_quality;
> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>
> Advantages
>
> - Users get a quick overview of the quality of a page (item).
> - SPARQL can be used to query only those items that meet a certain
> quality level.
> - The idea would probably be relatively easy to implement.
>
>
> Disadvantage:
>
> - In a way, the data model is abused by generating statements that no
> longer describe the item itself, but make statements about the
> representation of this item in Wikidata.
> - Additional computing power must be provided for the regular
> calculation of all changed items.
> - Only the quality of pages is referred to. If it is insufficient, the
> changes still have to be made manually.
>
>
> I would now be interested in the following:
>
> 1. Is this idea suitable to effectively help solve existing quality
> problems?
> 2. Which quality dimensions should the score value represent?
> 3. Which quality dimension can be calculated with reasonable effort?
> 4. How to calculate and represent them?
> 5. Which is the most suitable way to further discuss and implement
> this idea?
>
>
> Many thanks in advance.
>
> Uwe Jung (UJung <https://www.wikidata.org/wiki/User:UJung>)
> www.archivfuehrer-kolonialzeit.de/thesaurus
>
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata