Mike_Peel added a comment.

In https://phabricator.wikimedia.org/T105623#1657039, @daniel wrote:

> In https://phabricator.wikimedia.org/T105623#1656985, @Mike_Peel wrote:
>
> > > If we did not plan to support unit conversion, I would be ready to go 
> > > along with your argument. We would simply say we don't know th precision. 
> > > With unit conversion however we can't do this. And relying on the 
> > > conventions for specifying significant digits seems the best we can do.
> >
> >
> > That makes sense in the back-end to make sure that converted values have 
> > reasonable levels of precision, but does it have to be in the front-end as 
> > well, or stored in the database? A line of code that checks whether the 
> > uncertainty has been set or not, and assumes a minimum uncertainty for 
> > conversion purposes, should handle this issue smoothly, without 
> > mis-estimates of the uncertainty of the given value being displayed to 
> > readers.
>
>
> We can of course discuss if, when and how the explicit +/-X is shown to the 
> user.  I'm completely open to that. One sensible suggestion was to hide it if 
> the actual uncertainty is the same as what we would assume from the decimal 
> representation. In that case, it's OK to hide it, I think. Maybe also if the 
> precision is better than what we would assume. Maybe. But in any case it's 
> crucial to understand that we *have* do consider uncertainty everywhere if we 
> want to allow conversion.


That wouldn't work: the uncertainty should be shown if it is an 
accurate/referenced uncertainty, and that shouldn't depend on whether it's more 
or less than the assumed uncertainty. We should simply say what the uncertainty 
is if we have it, or say that we don't have an uncertainty if we don't.

> I think it makes sense to store the uncertainty in the database, since *if* 
> we assume an uncertainty at some point, users should be able to see, check, 
> modify, and compare it. Also, we need to be able to apply unit conversion for 
> queries, otherwise we couldn't compare feet to meter. And we have to take 
> uncertainty into account, so we know that 2m +/- 0.5 "matches" 7.2ft +/-0.1. 
> it's not *exactly* the same of course, but these two values were not exact to 
> begin with, so they should match.

> 

> We could store "unknown", and then re-calculate the uncertainty every time we 
> need it, but why? What would that gain us?


It would be an accurate way to represent the data that we have, and to clearly 
mark where we don't have uncertainties. It would avoid corrupting the database 
by mixing sourced and assumed uncertainties. We shouldn't be encouraging people 
to alter the assumed uncertainty used for conversion purposes (which they might 
do, e.g. to tweak how the converted number shows), as that would corrupt the 
database even more - we should instead be asking them to source the actual 
uncertainties. IMO there's a lot of up-sides to adopting this approach, and no 
significant down-sides.

> > > We are not making one up. The precision is given implicitly in the 
> > > decimal notation of the number, using the convention of significant 
> > > digits.  This is quite unambiguous for cases like 3.20 (three significant 
> > > digits) or 2.3e3 (2300 with two significant digits). It's ambiguous for 
> > > input like 200 or 1700 - there's a good chance that the zeros are 
> > > insignificant, but we don't really know. We should improve our UI to help 
> > > the user with correct input.

> 

> > 

> 

> > 

> 

> > I'm not convinced that the implicit assumption you're making here will work 
> > for most situations, so it really shouldn't be displayed to the reader. We 
> > should definitely be encouraging editors to add more accurate estimates of 
> > uncertainties at the same as the numbers are added though!

> 

> 

> I absolutely agree.

> 

> > One thing I'm particularly worried about here is that there doesn't seem to 
> > be a good way to tell assumed uncertainties and referenced uncertainties 
> > apart - which will be a huge headache to fix once this data format is in 
> > common usage! So please, let's get this right asap!

> 

> 

> Well, in scientific literature at least, a number like 2.30 or 2.3e3 has a 
> definite uncertainty (resp significant digits). It's given by convention of 
> the notation. Would you consider that a guess, or a sourced uncertainty?


It's a guess unless it's explicitly stated that the uncertainty is at that 
level, or that the work is following that convention. The standard approach in 
astronomy (which is the part of the scientific literature that I'm most 
familiar with, as a scientist working in that field) is to quote a number along 
with the uncertainty and the significance level associated with that 
uncertainty.


TASK DETAIL
  https://phabricator.wikimedia.org/T105623

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Mike_Peel
Cc: Mike_Peel, Jc3s5h, thiemowmde, kaldari, daniel, Stryn, Lydia_Pintscher, 
Liuxinyu970226, Snipre, Event, Ash_Crow, mgrabovsky, Micru, Denny, He7d3r, 
Bene, Wikidata-bugs, Ricordisamoa, Kelson, MSGJ, Klortho, Wolfvoll, Aklapper, 
aude



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to