Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

Gerard Meijssen Tue, 27 Aug 2019 23:22:24 -0700

Hoi,
Shopping for recognition for "your" project. Making it an issue that has to
affect everything else because the quality of "your" project is highly
problematic given a few basic facts. Wikidata has 59,284,641 items, this
effort is about 7500 people. They drown in a sea of other people, items.
Statistically the numbers involved are insignificant.


HOWEVER, when your effort has a practical application, it is that
application, the use of the data that ensures that the data will be
maintained and hopefully ensure that the quality of this subset is
maintained. When you want quality, static quality is achieved by
restricting at the gate. Dynamic quality is achieved by making sure that
the data is actually used. Scholia is an example of functionality that
supports existing data and everyone who uses it will see the flaws in the
data. It is why we need to import additional data, merge scientist when
there are duplicates, add additional authorities.

Yes, we need to achieve quality results. They will be achieved when people
use the data, find its flaws and consequently append and amend. Recognition
of quality is done best by supporting and highlighting the application of
our data and particularly be thankful to the consequential updates we
receive. The users that help us do better are our partners, all others
ensure our relevance.
Thanks,
      GerardM

On Wed, 28 Aug 2019 at 00:52, Magnus Sälgö <[email protected]> wrote:

> Uwe I feel this is more and more important with quality and provenance and
> also communicate inside Wikidata the quality of our data.
>
>  I have added maybe the best source for biographies in Sweden P3217 in
> Wikidata on 7500 person. In Wikipedia those 7500 objects are used on > 200
> different languages in Wikipedia we need to have a ”layer” explaining that
> data confirmed  with P3217 ”SBL from Sweden” has very high trust
>
> See https://phabricator.wikimedia.org/T222142
>
> I can also see this quality problem that  Nobelprize.org and Wikidata has
> > 30 differencies and its sometimes difficult to understand the quality of
> the sources in Wikidata plus that Nobelprize.com has no sources makes the
> equation difficult
> https://phabricator.wikimedia.org/T200668
>
> Regards
> Magnus Sälgö
> 0046-705937579
> [email protected]
>
> A blogpost I wrote
> https://minancestry.blogspot.com/2018/04/wikidata-has-design-problem.html
> <https://minancestry.blogspot.com/2018/04/wikidata-has-design-problem.html?m=1>
>
> 28 aug. 2019 kl. 03:49 skrev Uwe Jung <[email protected]>:
>
> Hello,
>
> many thanks for the answers to my contribution from 24.8.
> I think that all four opinions contain important things to consider.
>
> @David Abián
> I have read the article and agree that in the end the users decide which
> data is good for them or not.
>
> @GerardM
> It is true that in a possible implementation of the idea, the aspect of
> computing load must be taken into account right from the beginning.
>
> Please check that I have not given up on the idea yet. With regard to the
> acceptance of Wikidata, I consider a quality indicator of some kind to be
> absolutely necessary. There will be a lot of ordinary users who would like
> to see something like this.
>
> At the same time I completely agree with David;(almost) every chosen
> indicator is subject to a certain arbitrariness in the selection. There
> won't be one easy to understand super-indicator.
> So, let's approach things from the other side. Instead of a global
> indicator, a separate indicator should be developed for each quality
> dimension to be considered. With some dimensions this should be relatively
> easy. For others it could take years until we have agreed on an algorithm
> for their calculation.
>
> Furthermore, the indicators should not represent discrete values but a
> continuum of values. No traffic light statements (i.e.: good, medium, bad)
> should be made. Rather, when displaying the qualifiers, the value could be
> related to the values of all other objects (e.g. the value x for the
> current data object in relation to the overall average for all objects for
> this indicator). The advantage here is that the total average can increase
> over time, meaning that the position of the value for an individual object
> can also decrease over time.
>
> Another advantage: Users can define the required quality level themselves.
> If, for example, you have high demands on accuracy but few demands on the
> completeness of the statements, you can do this.
>
> However, it remains important that these indicators (i.e. the evaluation
> of the individual item) must be stored together with the item and can be
> queried together with the data using SPARQL.
>
> Greetings
>
> Uwe Jung
>
> Am Sa., 24. Aug. 2019 um 13:54 Uhr schrieb Uwe Jung <[email protected]>:
>
>> Hello,
>>
>> As the importance of Wikidata increases, so do the demands on the quality
>> of the data. I would like to put the following proposal up for discussion.
>>
>> Two basic ideas:
>>
>>    1. Each Wikidata page (item) is scored after each editing. This score
>>    should express different dimensions of data quality in a quickly 
>> manageable
>>    way.
>>    2. A property is created via which the item refers to the score
>>    value. Certain qualifiers can be used for a more detailed description 
>> (e.g.
>>    time of calculation, algorithm used to calculate the score value, etc.).
>>
>>
>> The score value can be calculated either within Wikibase after each data
>> change or "externally" by a bot. For the calculation can be used among
>> other things: Number of constraints, completeness of references, degree of
>> completeness in relation to the underlying ontology, etc. There are already
>> some interesting discussions on the question of data quality which can be
>> used here ( see  https://www.wikidata.org/wiki/Wikidata:Item_quality;
>> https://www.wikidata.org/wiki/Wikidata:WikiProject_Data_Quality, etc).
>>
>> Advantages
>>
>>    - Users get a quick overview of the quality of a page (item).
>>    - SPARQL can be used to query only those items that meet a certain
>>    quality level.
>>    - The idea would probably be relatively easy to implement.
>>
>>
>> Disadvantage:
>>
>>    - In a way, the data model is abused by generating statements that no
>>    longer describe the item itself, but make statements about the
>>    representation of this item in Wikidata.
>>    - Additional computing power must be provided for the regular
>>    calculation of all changed items.
>>    - Only the quality of pages is referred to. If it is insufficient,
>>    the changes still have to be made manually.
>>
>>
>> I would now be interested in the following:
>>
>>    1. Is this idea suitable to effectively help solve existing quality
>>    problems?
>>    2. Which quality dimensions should the score value represent?
>>    3. Which quality dimension can be calculated with reasonable effort?
>>    4. How to calculate and represent them?
>>    5. Which is the most suitable way to further discuss and implement
>>    this idea?
>>
>>
>> Many thanks in advance.
>>
>> Uwe Jung  (UJung <https://www.wikidata.org/wiki/User:UJung>)
>> www.archivfuehrer-kolonialzeit.de/thesaurus
>>
>>
>> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Proposal for the introduction of a practicable Data Quality Indicator in Wikidata (next round)

Reply via email to