MisterSynergy added a comment.

Sorry for being late.

I have now been working with quantity datatype properties a lot and I have to disagree here. I think that we should allow only integer bounds when the value is integer, as bounds cannot be non-integers in those cases.

Let’s have a look at actual numbers: right now we have ~3.8M statements of quantity properties with integer constraint (~2.7M mainsnak, ~1.1M qualifier, barely any in reference). In exactly one case there is an integer value with non-integer bounds, which is the P1114 qualifier of https://www.wikidata.org/wiki/Q26882302#P186 – that claim has other isses anyway and needs to be fixed. (I removed ~10 other wrong uses of bounds this morning).


Maybe I should add a more general rant about the quantity datatype here: users don’t understand it, which is why the vast majority of bounds and a substantial amount of units are wrong. Reasons:

  • The meaning of bounds in quantity datatype properties is not well-defined (particularly here: https://www.wikidata.org/wiki/Help:Data_type#quantity and https://www.mediawiki.org/wiki/Wikibase/DataModel#Quantities). The term “uncertainty interval” indicates that it should be used as measurement uncertainty, confidence intervals, etc., but this is actually not the case in Wikidata.
    • This leads to a situation where users use bounds as they personally prefer to, but one cannot rely on a particular meaning of any bounds given in Wikidata.
    • This also encourages users to abuse bounds for other purposes, e.g. compensate the lack of other datatypes.
    • General rule: valid bounds can also be found in the referenced sources of a claim. I’d say that clearly more than 95% of all bounds in Wikidata fail that criterion, as they are personal flavor of individual users or residuals of the automatic ±0 bounds addition of the software that we saw in the past.
  • Due to the lack of a “range” datatype, users add bounds as follows: source A claims a person has 2 children, and source B claims the same person has 3 children. Users add: 2.5±0.5 children, as this covers the range of values found in sources. (Yet I am not sure whether we should have a “range” datatype; multiple claims and use of ranks are the solution here.)
  • The lack of a “number” datatype makes the integer constraint necessary. This works out to some extent as we can see, but it is not optimal:
    • A “number” datatype would make accidental decimal places impossible.
    • A lot of wrong uses of units could be avoided as well (units such as “apple”, “passenger”, “train”, etc.) if the “number” datatype had a different kind of or even no unit attached.

TASK DETAIL
https://phabricator.wikimedia.org/T167989

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lucas_Werkmeister_WMDE, MisterSynergy
Cc: Ladsgroup, gerritbot, MisterSynergy, Esc3300, thiemowmde, daniel, Jonas, Lydia_Pintscher, Aklapper, Lucas_Werkmeister_WMDE, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, Gq86, Baloch007, Darkminds3113, Bsandipan, Lordiis, GoranSMilovanovic, lisong, Adik2382, Soteriaspace, Jayprakash12345, Th3d3v1ls, JakeTheDeveloper, Ramalepe, Liugev6, QZanden, LawExplorer, Lewizho99, Maathavan, Agabi10, Wikidata-bugs, aude, TheDJ, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to