Mike_Peel added a comment. In https://phabricator.wikimedia.org/T105623#1657039, @daniel wrote:
> In https://phabricator.wikimedia.org/T105623#1656985, @Mike_Peel wrote: > > > > If we did not plan to support unit conversion, I would be ready to go > > > along with your argument. We would simply say we don't know th precision. > > > With unit conversion however we can't do this. And relying on the > > > conventions for specifying significant digits seems the best we can do. > > > > > > That makes sense in the back-end to make sure that converted values have > > reasonable levels of precision, but does it have to be in the front-end as > > well, or stored in the database? A line of code that checks whether the > > uncertainty has been set or not, and assumes a minimum uncertainty for > > conversion purposes, should handle this issue smoothly, without > > mis-estimates of the uncertainty of the given value being displayed to > > readers. > > > We can of course discuss if, when and how the explicit +/-X is shown to the > user. I'm completely open to that. One sensible suggestion was to hide it if > the actual uncertainty is the same as what we would assume from the decimal > representation. In that case, it's OK to hide it, I think. Maybe also if the > precision is better than what we would assume. Maybe. But in any case it's > crucial to understand that we *have* do consider uncertainty everywhere if we > want to allow conversion. That wouldn't work: the uncertainty should be shown if it is an accurate/referenced uncertainty, and that shouldn't depend on whether it's more or less than the assumed uncertainty. We should simply say what the uncertainty is if we have it, or say that we don't have an uncertainty if we don't. > I think it makes sense to store the uncertainty in the database, since *if* > we assume an uncertainty at some point, users should be able to see, check, > modify, and compare it. Also, we need to be able to apply unit conversion for > queries, otherwise we couldn't compare feet to meter. And we have to take > uncertainty into account, so we know that 2m +/- 0.5 "matches" 7.2ft +/-0.1. > it's not *exactly* the same of course, but these two values were not exact to > begin with, so they should match. > > We could store "unknown", and then re-calculate the uncertainty every time we > need it, but why? What would that gain us? It would be an accurate way to represent the data that we have, and to clearly mark where we don't have uncertainties. It would avoid corrupting the database by mixing sourced and assumed uncertainties. We shouldn't be encouraging people to alter the assumed uncertainty used for conversion purposes (which they might do, e.g. to tweak how the converted number shows), as that would corrupt the database even more - we should instead be asking them to source the actual uncertainties. IMO there's a lot of up-sides to adopting this approach, and no significant down-sides. > > > We are not making one up. The precision is given implicitly in the > > > decimal notation of the number, using the convention of significant > > > digits. This is quite unambiguous for cases like 3.20 (three significant > > > digits) or 2.3e3 (2300 with two significant digits). It's ambiguous for > > > input like 200 or 1700 - there's a good chance that the zeros are > > > insignificant, but we don't really know. We should improve our UI to help > > > the user with correct input. > > > > > > > > > I'm not convinced that the implicit assumption you're making here will work > > for most situations, so it really shouldn't be displayed to the reader. We > > should definitely be encouraging editors to add more accurate estimates of > > uncertainties at the same as the numbers are added though! > > > I absolutely agree. > > > One thing I'm particularly worried about here is that there doesn't seem to > > be a good way to tell assumed uncertainties and referenced uncertainties > > apart - which will be a huge headache to fix once this data format is in > > common usage! So please, let's get this right asap! > > > Well, in scientific literature at least, a number like 2.30 or 2.3e3 has a > definite uncertainty (resp significant digits). It's given by convention of > the notation. Would you consider that a guess, or a sourced uncertainty? It's a guess unless it's explicitly stated that the uncertainty is at that level, or that the work is following that convention. The standard approach in astronomy (which is the part of the scientific literature that I'm most familiar with, as a scientist working in that field) is to quote a number along with the uncertainty and the significance level associated with that uncertainty. TASK DETAIL https://phabricator.wikimedia.org/T105623 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Mike_Peel Cc: Mike_Peel, Jc3s5h, thiemowmde, kaldari, daniel, Stryn, Lydia_Pintscher, Liuxinyu970226, Snipre, Event, Ash_Crow, mgrabovsky, Micru, Denny, He7d3r, Bene, Wikidata-bugs, Ricordisamoa, Kelson, MSGJ, Klortho, Wolfvoll, Aklapper, aude _______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
