Thanks, the prototype helps make some this more concrete.

I am increasingly wondering if "uncertainty" will be overloaded here.
People seem to want to use it for various types of measurement uncertainty
(e.g. the standard error), ranges with no defined central value,
and distributional summaries (e.g. max and min), as well as for the
precision with which a value is entered (as in the  "auto-certainty" value
in the prototype). These are all quite different beasts, and conflating
them will probably lead to problems - particularly for precision versus the
rest. Which do we choose, if both apply? How will we know which is meant?
Maybe marking "auto-certainty" values somehow would mitigate the latter
problem, at least.

Avenue

On Thu, Dec 20, 2012 at 4:10 PM, Denny Vrandečić <
[email protected]> wrote:

> I am still trying to catch up with the whole discussion and to distill the
> results, both here and on the wiki.
>
> In the meanwhile, I have tried to create a prototype of how a complex
> model can still be entered in a simple fashion. A simple demo can be found
> here:
>
> <http://simia.net/valueparser/>
>
> The prototype is not i18n.
>
> The user has to enter only the value, in a hopefully intuitive way (try it
> out), and the full interpretation is displayed here (that, alas, is not
> intuitive, admittedly).
>
> Cheers,
> Denny
>
>
>
>
>
> 2012/12/20 <[email protected]>
>
> **
>>
>> (Proposal 3, modified)
>> * value (xsd:double or xsd:decimal)
>>
>> * unit (a wikidata item)
>>
>> * totalDigits (xsd:smallint)
>> * fractionDigits (xsd:smallint)
>> * originalUnit (a wikidata item)
>> * originalUnitPrefix (a wikidata item)
>> JMc: I rearranged the list a bit and suggested simpler naming
>>
>> JMc: Is not originalUnitPrefix directly derived from originalUnit?
>>
>> JMc: May be more efficient to store not reconstruct the original value. May 
>> even be better to store the original value somewhere else entirely, earlier 
>> in the process, eg within the context that you indicate would be worthwhile 
>> to capture, because I wouldnt expect alot of retrievals, but you anticipate 
>> usage patterns certainly better than I.
>>
>> How about just:
>>
>>
>> Datatype: .number  (Proposal 4)
>>
>> -----------------------------------------
>>   :value (xsd:double or xsd:decimal)
>>
>>   :unit (a wikidata item)
>>   :totalDigits (xsd:smallint)
>>   :fractionDigits (xsd:smallint)
>>
>>
>>   :original (a wikidata item that is a number object)
>>
>> On 20.12.2012 03:08, Gregor Hagedorn wrote:
>>
>> On 20 December 2012 02:20,  <[email protected]> wrote:
>>
>> For me the question is how to name the precision information. Do not the
>> XSD facets "totalDigits" and "fractionDigits" work well enough? I mean
>>
>> Yes, that would be one way of modeling it. And I agree with you that,
>> although the xsd attributes originally are devised for datatypes,
>> there is nothing wrong with re-using it for quantities and
>> measurements.
>>
>> So one way of expressing a measurement with significant digits is:
>> (Proposal 1)
>> * normalizedValue
>> * totalDigits
>> * fractionDigits
>> * originalUnit
>> * normalizedUnit
>>
>> To recover the original information (e.g. that the original value was
>> in feet with a given number of significant digits) the software must
>> convert normalizedUnit to originalUnit, scale to totalDigits with
>> fractionDigits, calculate the remaining powers of ten, and use some
>> information that must be stored together with each unit whether this
>> then should be expressed using an SI unit prefix (the Exa, Tera, Giga,
>> Mega, kilo, hekto, deka, centi, etc.). Some units use them, others
>> not, and some units use only some. Hektoliter is common, hektometer
>> would be very odd. This is slightly complicated by the fact that for
>> some units prefix usage in lay topics differs from scientific use.
>>
>> If all numbers were expressed ONLY as total digits with fraction
>> digits and unit-prefix, i.e. no power-of-ten exponential, the above
>> would be sufficiently complete. However, without additional
>> information it does not allow to recover the entry:
>>
>> 100,230 * 10^3 tons
>> (value 1.0023e8, 6 total, 3 fractional digits, original unit tons,
>> normalized unit gram)
>>
>> I had therefore made (on the wiki) the proposal to express it as:
>>
>> (Proposal 2)
>> * normalizedValue
>> * significantDigits (= and I am happy with totalDigits instead)
>> * originalUnit
>> * originalUnitPrefix
>> * normalizedUnit
>>
>> However I see now that the analysis was wrong, indeed it needs
>> fractionDigits in addition to totalDigits, else a similar problem may
>> occur, i.e. the distribution of the total order of magnitude of the
>> number between non-fractional digits, fractional digits, powers of 10
>> and powers-of-10-expressed through SI units is still not unambigous.
>>
>> So the minimal representation seems to be:
>>
>> (Proposal 3)
>> * normalizedValue (xsd:double or xsd:decimal)
>> * totalDigits (xsd:smallint)
>> * fractionDigits (xsd:smallint)
>> * originalUnit (a wikidata item)
>> * originalUnitPrefix (a wikidata item)
>> * normalizedUnit (a wikidata item)
>>
>> Adding the originalUnitPrefix has the advantage that it gathers
>> knowledge from users and data creators or resources about which unit
>> prefix is appropriate in a given context.
>>
>> I see the current wikidata plan to solve this problem by heuristics
>> very critical, I do not see the data set that sufficiently tests the
>> heuristics yet. Gathering information from data entered and creating a
>> formatting heuristics modules over the coming years (instead of weeks)
>> will be valuable for reformatting. The Proposal 3 allows to gather
>> this information.
>>
>> Gregor
>>
>> Note 1: The question of other means to express accuracy or precision,
>> e.g. by error margins, statistical measures of spread such as
>> variance, confidence intervals, percentiles, min/max etc. is not yet
>> covered.
>>
>> Given the present discussion, this should probably be separately agreed upon.
>>
>> Note 2: Wikipedia Infoboxes may desire to override it, this is for
>> data entering, review, curation, and a default display where no other
>> is defined
>>
>> _______________________________________________
>> Wikidata-l mailing 
>> [email protected]https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>>
>> _______________________________________________
>> Wikidata-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>>
>>
>
>
> --
> Project director Wikidata
> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
>
>
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to