Re: [Wikidata-l] Data values

2012-12-19 Thread Friedrich Röhrs
I don't understand why 1.6e-8 is absolutly necessary for sorting and
comparison. PHP allows for the definition of custom sorting functions. If a
custom datatype is defined, a custom sorting/comparision function can be
defined too. Or am i missing some performance points?


On Wed, Dec 19, 2012 at 10:30 AM, Nikola Smolenski smole...@eunet.rswrote:

 On 19/12/12 08:53, Gregor Hagedorn wrote:

 I agree. What I propose is that the user interface supports entering
 and proofreading 10.6 nm as 10.6 plus n (= nano) plus meter.
 How the value is stored in the data property, whether as 10.6 floating
 point or as 1.6e-8 is a second issue -- the latter is probably
 preferable. I only intend to show that scientific values are not


 Perhaps both should be stored. 1.6e-8 is necessary for sorting and
 comparison. But 10.6 nm is how the user entered it, presumably how it was
 written in the source that the user used, how is it preferably used in the
 given field, and how other users would want to see it and edit it.

 As an example, human height is commonly given in centimetres, while
 building height is commonly given in metres. So, users will probably prefer
 to edit the tallest person as 282cm and the lowest building as 2.1m even
 though the absolute values are similar.


 __**_
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/**mailman/listinfo/wikidata-lhttps://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 11:56, Friedrich Röhrs wrote:
 I don't understand why 1.6e-8 is absolutly necessary for sorting and 
 comparison.
 PHP allows for the definition of custom sorting functions. If a custom 
 datatype
 is defined, a custom sorting/comparision function can be defined too. Or am i
 missing some performance points?

We are talking about searching and sorting millions of data entries - doing that
in PHP would be extremely slow and would take far more memory than we have. It
has to be done natively in the database. So we have to use a data representation
that can be natively compared and sorted by the database (at the very least by
MySQL, but ideally, by many different database systems).

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 In addition to a storage option of the desired unit prefix (this may
 be considered a original-prefix, since naturally re-users may wish to
 reformat this).

 I see no point in storing the unit used for input.

I think you plan to store the unit (which would be meter), so you
don't want to store prefixes, correct?

Please argue why you don't see a point. You want to both the size of
the universe, distance to New York, size of the proton in meter? If
not, with which algorithm will you restore the SI prefix, or rather,
recognize with SI-prefix is usable? We do not use Mm in common
language, so we do give the circumference of the earth as roughly 40
000 km and not as 40 Mm. We don't write 4*10^7 m either.

 it is probably necessary to store the number of
 significant decimals.

 That's how Denny proposed to calculate the default accuracy. If the accuracy 
 is
 given by a complex model (e.g. a gamma distribution), then it might be handy 
 to
 have a simple value that tells us the significant digits.

 Hm... perhaps it's best to always express accuracy as +/-n, and allow for 
 more
 detailed information (standard deviation, whatever) as *additional* 
 information
 about the accuracy (could be modelled as a qualifier internally).

I fear that is two separate levels of precision of giving a measure of
measurement _precision_ (I believe accuracy is the wrong term here,
precision and accuracy are related but distinct concepts). So 4.10
means that the last digit is significant, i.e. the best estimate is at
least between 4.095 and 4.105 (but it may be better). . 4.10 +/- 0.005
means it is precisely 4.095 and 4.105, as opposed to 4.10 +/- 0.004,
4.10 +/- 0.003,  4.10 +/- 0.002 etc.

Futhermore, a quantity may be given as 4.10-4.20-4.35. The precision
of measurement and the the measure of variance and dispersion are
separate concepts.


 I believe in the user interface this needs not
 be any visible setting, simply the number of digits can be preserved.
 Without these is impossible to store and reproduce information  like
 10.20 nm, it would be returned as 1.02 10^-8 m.

 No, it would return using whatever system of measurement the user has selected
 in their preferences.

then you have lost the information. There is no user selection in
this in science.

 Complex heuristic
 may guess when to use the scientific SI prefixes instead. The
 trailing zero cannot be reproduced however when completely relying on
 IEEE floating-point.

 We'll need heuristics to pick the correct secondary unit (e.g. nm or km). The

(I believe there is no such thing as a secondary unit, did you make
that term up? Only m is a unit of measurement, the n or k are
prefixes see http://en.wikipedia.org/wiki/SI_prefix )

 general rule could be to pick a unit so that the actual value is between 1 and
 10, with some additional rules for dealing with cultural specialities 
 (decimeter
 is rarely used, hectoliter however is pretty common. The decagram is commonly
 used in Austria only, etc).

You would need to also know which prefix is applicable to which unit
in which context. In a scientific context different prefixes are used
than in a lay context. In a lay context astronomical temperatures may
be given as degree celsius, in a scientific as kelvin. This is not
just a user preference.

I agree that the system should allow explicit conversion in infoboxes.
I disagree that you should create an artifical intelligence system for
wikidata that knows more about unit usage than the authors. To store
the wisdom of authors, storing both unit and original unit prefix is
necessary.


You write The Precision can be derived from the accuracy and vice
versa, using appropriate heuristics.

I _terrible strongly_ doubt that. Can you give any proof of that? For
precision I can use statistics, for accuracy and need an indirect,
separate and precise method to estimate accuracy. If you have a
laser-distance measurement device, the precision can be estimated by
yourself by repeated measurements at various times, temperatures, etc.
But unless you have an objective distance standard, you have no means
to determine whether the accuracy of the device is always off by 10 cm
because someone screwed up the software program inside the device.

 But they are not the same. IMHO, the accuracy should always be stored with the
 value, the precision never.

I fear that is a view of how data in a perfect world should be known,
not a reflection of the kind of data that people need to store in
Wikidata. Very often only the precision will be known or available to
its authors, or worse, the source may not say which it is.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Martynas Jusevičius
Hey wikidatians,

occasionally checking threads in this list like the current one, I get
a mixed feeling: on one hand, it is sad to see the efforts and
resources waisted as Wikidata tries to reinvent RDF, and now also
triplestore design as well as XSD datatypes. What's next, WikiQL
instead of SPARQL?

On the other hand, it feels reassuring as I was right to predict this:
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html

Best,

Martynas
graphity.org

On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
daniel.kinz...@wikimedia.de wrote:
 On 19.12.2012 14:34, Friedrich Röhrs wrote:
 Hi,

 Sorry for my ignorance, if this is common knowledge: What is the use case for
 sorting millions of different measures from different objects?

 Finding all cities with more than 10 inhabitants requires the database to
 look through all values for the property population (or even all properties
 with countable values, depending on implementation an query planning), compare
 each value with 10 and return those with a greater value. To speed this
 up, an index sorted by this value would be needed.

 For cars there could be entries by the manufacturer, by some
 car-testing magazine, etc. I don't see how this could be adequatly
 represented/sorted by a database only query.

 If this cannot be done adequatly on the database level, then it cannot be done
 efficiently, which means we will not allow it. So our task is to come up with 
 an
 architecture that does allow this.

 (One way to allow scripted queries like this to run efficiently is to do 
 this
 in a massively parallel way, using a map/reduce framework. But that's also not
 trivial, and would require a whole new server infrastructure).

 If however this is necessary, i still don't understand why it must affect the
 datavalue structure. If a index is necessary it could be done over a 
 serialized
 representation of the value.

 Serialized can mean a lot of things, but an index on some data blob is only
 useful for exact matches, it can not be used for greater/lesser queries. We 
 need
 to map our values to scalar data types the database can understand directly, 
 and
 use for indexing.

 This needs to be done anyway, since the values are
 saved at a specific unit (which is just a wikidata item). To compare them on 
 a
 database level they must all be saved at the same unit, or some sort of
 procedure must be used to compare them (or am i missing something again?).

 If they measure the same dimension, they should be saved using the same unit
 (probably the SI base unit for that dimension). Saving values using different
 units would make it impossible to run efficient queries against these values,
 thereby defying one of the major reasons for Wikidata's existance. I don't 
 see a
 way around this.

 -- daniel

 --
 Daniel Kinzler, Softwarearchitekt
 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Marco Fleckinger



On 2012-12-19 15:11, Daniel Kinzler wrote:

On 19.12.2012 14:34, Friedrich Röhrs wrote:

Hi,

Sorry for my ignorance, if this is common knowledge: What is the use case for
sorting millions of different measures from different objects?


Finding all cities with more than 10 inhabitants requires the database to
look through all values for the property population (or even all properties
with countable values, depending on implementation an query planning), compare
each value with 10 and return those with a greater value. To speed this
up, an index sorted by this value would be needed.


To be added by multiple simultaneous sorting operations.


For cars there could be entries by the manufacturer, by some
car-testing magazine, etc. I don't see how this could be adequatly
represented/sorted by a database only query.


If this cannot be done adequatly on the database level, then it cannot be done
efficiently, which means we will not allow it. So our task is to come up with an
architecture that does allow this.

(One way to allow scripted queries like this to run efficiently is to do this
in a massively parallel way, using a map/reduce framework. But that's also not
trivial, and would require a whole new server infrastructure).

Software developers are not allowed to just think of the status quo they 
also have to think of use case the solution might gonna be used.


There is e.g. the idea of pushing the monuments lists into wikidata. 
Only in Austria there are 36.000-37.000 of those. Germany is much bigger 
but has a similar history with probably an equal number per square 
kilometers. Sorting this by distance to a specific place needs to be 
done by the database. Everything else will be too ineffective.



If however this is necessary, i still don't understand why it must affect the
datavalue structure. If a index is necessary it could be done over a serialized
representation of the value.


Serialized can mean a lot of things, but an index on some data blob is only
useful for exact matches, it can not be used for greater/lesser queries. We need
to map our values to scalar data types the database can understand directly, and
use for indexing.


+1


This needs to be done anyway, since the values are
saved at a specific unit (which is just a wikidata item). To compare them on a
database level they must all be saved at the same unit, or some sort of
procedure must be used to compare them (or am i missing something again?).


If they measure the same dimension, they should be saved using the same unit
(probably the SI base unit for that dimension). Saving values using different
units would make it impossible to run efficient queries against these values,
thereby defying one of the major reasons for Wikidata's existance. I don't see a
way around this.

IMHO this should be part of a model. E.g. Altitudes are usually measured 
in metres or feet, never in km or yards. Distances have the same SI base 
unit but are measured also measured in km, depending of the use case.


Maybe we should make a difference between internal usage and 
visualization. Comparing meters with kilometers and feet is quite 
difficult, transcaling everything on visualization not.


Cheers

Marco

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Nikola Smolenski

On 19/12/12 15:33, Nikola Smolenski wrote:

On 19/12/12 12:23, Daniel Kinzler wrote:

I don't think we can sensibly support historical units with unknown
conversions,
because they cannot be compared directly to SI units. So, they
couldn't be used
to answer queries, can't be converted for display, etc - they arn't
units in any
sense the software can understand. This is a solvable problem, but
would add a
tremendous amount of complexity.


Ah, but they could still be meaningfully compared to each other. And if
approximate conversion is known, this could be still be used to make the
conversion so that the measure is converted and its uncertainty increased.

Just throwing more info here: there might also be cases where we could
have multiple competing conversions. Somewhat similar to units,
something that I would very much like to see is comparison of various
monetary values, adjusted for inflation or exchange rate. But then you
would have various estimates of inflation by various bodies and you
might want to compare by either of them (or a combination of them?).


Appropriate conversion might also depend on the item in question. For 
example, old censuses sometimes measure population not in people but in 
households. In some cases we might have the idea of how large a 
household is to give estimate of the population.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Avenue
On Wed, Dec 19, 2012 at 2:32 PM, Marco Fleckinger 
marco.fleckin...@wikipedia.at wrote:


 IMHO this should be part of a model. E.g. Altitudes are usually measured
 in metres or feet, never in km or yards. Distances have the same SI base
 unit but are measured also measured in km, depending of the use case.


No, altitudes are sometimes measured in km, at least once you get beyond
the Earth's surface.

From http://en.wikipedia.org/wiki/Hubble_Space_Telescope:
Orbit height 559 km (347 mi)

From http://en.wikipedia.org/wiki/Olympus_Mons:
Peak 21 km (69,000 ft) above datum
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 15:32, Marco Fleckinger wrote:
 Maybe we should make a difference between internal usage and visualization.
 Comparing meters with kilometers and feet is quite difficult, transcaling
 everything on visualization not.

Not maybe. Definitely. Visualization is based on user preference, interface
language, and heuristics for picking a decent unit based on dimension and
accuracy. The internal representation should use the same unit for all
quantities of a given dimension.

-- daniel


-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 15:26, Avenue wrote:
 What about the North and South Poles? 

I'm sure standard coordinate systems have a convention for representing them.

 Won't we need lots of units that are not SI units (e.g. base pairs, IQ points,
 Scoville heat units, $ and €) and can't readily be translated into them? Why
 would historical units with unknown conversions pose any more problem than 
 these?

These all pose the same problems, correct. At the moment, I'm very unsure about
how to accommodate these at all. Maybe we can have them as custom units, which
are fixed for a given property, and can not be converted.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 15:11, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
 If they measure the same dimension, they should be saved using the same unit
 (probably the SI base unit for that dimension). Saving values using different
 units would make it impossible to run efficient queries against these values,
 thereby defying one of the major reasons for Wikidata's existance. I don't 
 see a
 way around this.

Daniel confirms (in separate mail) that Wikidata indeed intends to
convert any derived SI units to a common formula of base units.

Example: a quantity like 1013 hektopascal, the common unit for
meterological barometric pressure (this used to be millibar), would be
stored and re-displayed as
1.013 10^5 kg⋅m−1⋅s−2

I see several problems with this approach:

1. Many base units are little known. kg⋅m2⋅s−3⋅A−2 for Ohm... It
breaks communication with humans curating data on wikidata. It will
make it very difficult to compare data entered into wikidata for
correctness, because the data displayed after saving will have little
relation with the data entered. This makes Wikidata inherently
unsuitable for an effort like Wikipedia with many authors and the
reliance on fact checking.

2. Even for standard base units, there is often a 1:n relation. e,g,
both gray   and sievert have the same base unit. The base unit for lumen
is candela (because the steradians is not a unit, but part of the
derived unit applicability definition)

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard

 I don't think we can sensibly support historical units with unknown
 conversions,
 because they cannot be compared directly to SI units. So, they couldn't be
 used
 to answer queries, can't be converted for display, etc - they arn't units
 in any
 sense the software can understand. This is a solvable problem, but would
 add a
 tremendous amount of complexity.


I get the feeling that I might be the only person on this thread that
doesn't have a maths/sciences/computers background here. I'm going to be
frank here: We need to snap out of the mindset that all of the data we're
collecting is going to be easily expressible using modern scientific units
and methodologies. If we try and cram everything into a small number of
common units, without giving the users some method of expressing
non-standard/uncommon/non-scientific values, we're going to have a massive
database that is going to at best be cumbersome and at worst be useless for
a great deal of information. Traditional Chinese units of measurement [1]
have changed their actual value over time. A li in one century is not as
long as it is in another century, and while there is a li to SI conversion,
it's artificial and when we try to use the modern li to measure something,
we get a different value for that thing than the historically documented li
value states it should be.

There is a balance. The more flexible the parameters, the easier it is to
put data in, but the harder it is for computers to make useful connections
with it. I'm not sure how to handle this, but I am sure that we can't just
keep pretending that all of the data we're going to collect falls nicely
into the metric system. Reality just doesn't work that way, and for
Wikidata to be useful, we can't discount data that doesn't fit in the mold
of modern units.

Sven

[1] http://en.wikipedia.org/wiki/Chinese_units_of_measurement
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Marco Fleckinger



On 2012-12-19 16:56, Daniel Kinzler wrote:

On 19.12.2012 16:47, Gregor Hagedorn wrote:

Daniel confirms (in separate mail) that Wikidata indeed intends to
convert any derived SI units to a common formula of base units.

Example: a quantity like 1013 hektopascal, the common unit for
meterological barometric pressure (this used to be millibar), would be
stored and re-displayed as
1.013 10^5 kg⋅m−1⋅s−2


Converted and stored, yes, but not displayed. For display, it would be converted
to a customary/convenient unit according to the user's (or client site's)
locale, using a bit of heuristics to get the scale (order of magnitude) right.

Of course, in wikitext, the desired output unit can be specified.


Actually we have 3 different use cases of values:

1. Internally in the data base
2. On wikidata.org
3. On other projects like WP and also WM external projects

SI shall be used internally (1.)

On 2. the user can decide what he wants.

On 3. either some standard setting of the MW-project says, what is 
desired or the article's author.


Via API (also 3.) you should be able to choose:
* precision
* displaying of accuracy
* unit

-- Marco

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Denny Vrandečić
Martynas,

could you please let me know where RDF or any of the W3C standards covers
topics like units, uncertainty, and their conversion. I would be very much
interested in that.

Cheers,
Denny




2012/12/19 Martynas Jusevičius marty...@graphity.org

 Hey wikidatians,

 occasionally checking threads in this list like the current one, I get
 a mixed feeling: on one hand, it is sad to see the efforts and
 resources waisted as Wikidata tries to reinvent RDF, and now also
 triplestore design as well as XSD datatypes. What's next, WikiQL
 instead of SPARQL?

 On the other hand, it feels reassuring as I was right to predict this:
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html

 Best,

 Martynas
 graphity.org

 On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
 daniel.kinz...@wikimedia.de wrote:
  On 19.12.2012 14:34, Friedrich Röhrs wrote:
  Hi,
 
  Sorry for my ignorance, if this is common knowledge: What is the use
 case for
  sorting millions of different measures from different objects?
 
  Finding all cities with more than 10 inhabitants requires the
 database to
  look through all values for the property population (or even all
 properties
  with countable values, depending on implementation an query planning),
 compare
  each value with 10 and return those with a greater value. To speed
 this
  up, an index sorted by this value would be needed.
 
  For cars there could be entries by the manufacturer, by some
  car-testing magazine, etc. I don't see how this could be adequatly
  represented/sorted by a database only query.
 
  If this cannot be done adequatly on the database level, then it cannot
 be done
  efficiently, which means we will not allow it. So our task is to come up
 with an
  architecture that does allow this.
 
  (One way to allow scripted queries like this to run efficiently is to
 do this
  in a massively parallel way, using a map/reduce framework. But that's
 also not
  trivial, and would require a whole new server infrastructure).
 
  If however this is necessary, i still don't understand why it must
 affect the
  datavalue structure. If a index is necessary it could be done over a
 serialized
  representation of the value.
 
  Serialized can mean a lot of things, but an index on some data blob is
 only
  useful for exact matches, it can not be used for greater/lesser queries.
 We need
  to map our values to scalar data types the database can understand
 directly, and
  use for indexing.
 
  This needs to be done anyway, since the values are
  saved at a specific unit (which is just a wikidata item). To compare
 them on a
  database level they must all be saved at the same unit, or some sort of
  procedure must be used to compare them (or am i missing something
 again?).
 
  If they measure the same dimension, they should be saved using the same
 unit
  (probably the SI base unit for that dimension). Saving values using
 different
  units would make it impossible to run efficient queries against these
 values,
  thereby defying one of the major reasons for Wikidata's existance. I
 don't see a
  way around this.
 
  -- daniel
 
  --
  Daniel Kinzler, Softwarearchitekt
  Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
 
 
  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
Körperschaften I Berlin, Steuernummer 27/681/51985.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 These all pose the same problems, correct. At the moment, I'm very unsure 
 about
 how to accommodate these at all. Maybe we can have them as custom units, 
 which
 are fixed for a given property, and can not be converted.

I think the proposal to use wikidata items for the units (that is both
base and derived SI as well as Imperial units/US customary units) is
most sensible.

Let people use the units they need. Then write software that picks up
the units that people use (after verifying common and correct use) by
means of their Wikidata item ID. With successive versions of Wikidata,
pick up more and more of these and make them available for conversion.

This way Wikidata will become what is needed.

I fear the discussion presently is about anticipating the needs of the
next years and not allowing any data into wikidata that have not been
foreseen.

There may be a way that Wikidata can have enough artifical
intelligence to predict  which unit prefixes are usable in common
topics versus scientific topics, which units shall be used. Where
Megaton is used (TNT of atomic bombs) and where 10^x ton are
preferred (shipping). And that the base unit for weight is kilogram,
but for gold in a certain value range ounce may be preferred and
gemstones and pearls in carat
(http://en.wikipedia.org/wiki/Carat_(unit) ).

But I believe forcing Wikidata to solve that problem first and
ignoring the wisdom of the users is the wrong path.

Modelling Wikidata on the feet versus meter and Fahrenheit versus
Celsius problem, where US citizens have a different personal
preference is misleading. The issue is much more complex.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Daniel Kinzler
On 19.12.2012 16:41, Marco Fleckinger wrote:
 I assume there's a table for usual units for different purposes. E.g. 
 altitudes
 are displayed in m and ft. Out of that one of those is chosen by the user's
 locale setting. My locale-setting would be kind of metric system, therefore 
 it
 will be displayed in m on my wikidata-surface. On enwiki it will probably be
 displayed in ft.

I'd have thought that we'd have one such table per dimension (such as length
or weight). It may make sense to override that on a per-property basis, so
2300m elevation isn't shown as 2.3km. Or that can be done in the template that
renders the value.

 My suggestion would be:
 
 * Somebody types in 4.10, so 4.10 will be saved. There is no accuracy 
 available
 so n/a is been saved for the accuracy or even the javascript way could be 
 used,
 which will be undefined (because not mentioned). Retrieving this will result 
 in
 4.10 or {value:4.10}.

What is saved would depend on unit conversion, the value actually stored in the
database would be in a base unit. In addition, the input'S precision would be
usewd to derive the value'S accuracy: entering 4.10m will make the accuracy
default to 10cm (+/- 5cm).

 Futhermore, a quantity may be given as 4.10-4.20-4.35. The precision
 of measurement and the the measure of variance and dispersion are
 separate concepts.

 Hm, somewhere in the scope of mechanical engineering there are also existing
 ±-values where the tolerances up and down differ from each other. E.g: it 
 should
 be 11.2, but it may be 11.1 or 11.35.

I'd suggest to store such additional information in a Qualifier instead of the
Data Value itself.

 I fear that is a view of how data in a perfect world should be known,
 not a reflection of the kind of data that people need to store in
 Wikidata. Very often only the precision will be known or available to
 its authors, or worse, the source may not say which it is.

 I think this is kind of Wikidata definitions. Since years now precision is 
 used
 for the amount of digits behind the comma. Now we need another word for
 expressing how accurate a value is. Therefore: Do we have a glossary?

Indeed we do: https://wikidata.org/wiki/Wikidata:Glossary

I use precision exactly like that: significant digits when rendering output or
parsing intput. It can be used to *guess* at the values accuracy, but is not the
same.

-- daniel

-- 
Daniel Kinzler, Softwarearchitekt
Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Friedrich Röhrs
When we speak about dimensions, we talk about properties right?

So when I define the property height of a person as an entity, i would
supply the SI unit (m) and the SI multiple (-2, cm) that it should be saved
in (in the database).

When someone then inputs the height in meters (e.g. 1.86m) it would be
converted to the matching SI multiple before being saved (i.e. 186 (cm)).

On the database side each SI multiple would get its own table so that
indexes can easily be made. Depending on which multiple we choose in the
property the datavalue would be saved to a different table. Did i get the
idea correctly?


On Wed, Dec 19, 2012 at 4:47 PM, Gregor Hagedorn g.m.haged...@gmail.comwrote:

 On 19 December 2012 15:11, Daniel Kinzler daniel.kinz...@wikimedia.de
 wrote:
  If they measure the same dimension, they should be saved using the same
 unit
  (probably the SI base unit for that dimension). Saving values using
 different
  units would make it impossible to run efficient queries against these
 values,
  thereby defying one of the major reasons for Wikidata's existance. I
 don't see a
  way around this.

 Daniel confirms (in separate mail) that Wikidata indeed intends to
 convert any derived SI units to a common formula of base units.

 Example: a quantity like 1013 hektopascal, the common unit for
 meterological barometric pressure (this used to be millibar), would be
 stored and re-displayed as
 1.013 10^5 kg⋅m−1⋅s−2

 I see several problems with this approach:

 1. Many base units are little known. kg⋅m2⋅s−3⋅A−2 for Ohm... It
 breaks communication with humans curating data on wikidata. It will
 make it very difficult to compare data entered into wikidata for
 correctness, because the data displayed after saving will have little
 relation with the data entered. This makes Wikidata inherently
 unsuitable for an effort like Wikipedia with many authors and the
 reliance on fact checking.

 2. Even for standard base units, there is often a 1:n relation. e,g,
 both gray   and sievert have the same base unit. The base unit for
 lumen
 is candela (because the steradians is not a unit, but part of the
 derived unit applicability definition)

 Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Herman Bruyninckx

On Wed, 19 Dec 2012, Denny Vrandečić wrote:


Martynas,
could you please let me know where RDF or any of the W3C standards covers 
topics like units,
uncertainty, and their conversion. I would be very much interested in that.


NIST has created a standard in OWL: QUDT - Quantities, Units, Dimensions and Data 
Types in OWL and XML:
 http://www.qudt.org/qudt/owl/1.0.0/index.html

I fully share Martynas' concerns: most of the problems that are being
discussed in this thread (and that are very relevant and interesting)
should not be solved with an object oriented approach (that is, via
properties of objects, and inheritance) but by semantic modelling (that
is, composition of knowledge). For example, one single data base
representation of a unit can have multiple displays depending on who
wants to see the unit, and in which context; the viewer and the context are
rather simple to add via semantic primitives. For example, the Topic Map
semantic standard would fit here very well, in my opinion:
 http://en.wikipedia.org/wiki/Topic_map.


Cheers,
Denny


Herman


2012/12/19 Martynas Jusevičius marty...@graphity.org
  Hey wikidatians,

  occasionally checking threads in this list like the current one, I get
  a mixed feeling: on one hand, it is sad to see the efforts and
  resources waisted as Wikidata tries to reinvent RDF, and now also
  triplestore design as well as XSD datatypes. What's next, WikiQL
  instead of SPARQL?

  On the other hand, it feels reassuring as I was right to predict this:
  http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
  http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html

  Best,

  Martynas
  graphity.org

  On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
  daniel.kinz...@wikimedia.de wrote:
   On 19.12.2012 14:34, Friedrich Röhrs wrote:
   Hi,
  
   Sorry for my ignorance, if this is common knowledge: What is the use 
case for
   sorting millions of different measures from different objects?
  
   Finding all cities with more than 10 inhabitants requires the 
database to
   look through all values for the property population (or even all 
properties
   with countable values, depending on implementation an query planning), 
compare
   each value with 10 and return those with a greater value. To 
speed this
   up, an index sorted by this value would be needed.
  
   For cars there could be entries by the manufacturer, by some
   car-testing magazine, etc. I don't see how this could be adequatly
   represented/sorted by a database only query.
  
   If this cannot be done adequatly on the database level, then it cannot 
be done
   efficiently, which means we will not allow it. So our task is to come 
up with an
   architecture that does allow this.
  
   (One way to allow scripted queries like this to run efficiently is to 
do this
   in a massively parallel way, using a map/reduce framework. But that's 
also not
   trivial, and would require a whole new server infrastructure).
  
   If however this is necessary, i still don't understand why it must 
affect the
   datavalue structure. If a index is necessary it could be done over a 
serialized
   representation of the value.
  
   Serialized can mean a lot of things, but an index on some data blob 
is only
   useful for exact matches, it can not be used for greater/lesser 
queries. We need
   to map our values to scalar data types the database can understand 
directly, and
   use for indexing.
  
   This needs to be done anyway, since the values are
   saved at a specific unit (which is just a wikidata item). To compare 
them on a
   database level they must all be saved at the same unit, or some sort of
   procedure must be used to compare them (or am i missing something 
again?).
  
   If they measure the same dimension, they should be saved using the same 
unit
   (probably the SI base unit for that dimension). Saving values using 
different
   units would make it impossible to run efficient queries against these 
values,
   thereby defying one of the major reasons for Wikidata's existance. I 
don't see a
   way around this.
  
   -- daniel
  
   --
   Daniel Kinzler, Softwarearchitekt
   Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
  
  
   ___
   Wikidata-l mailing list
   Wikidata-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikidata-l

  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l




--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. 

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
 it is probably necessary to store the number of
 significant decimals.

 Yes, that *is* the accuracy value i mean.

Daniel, please use correct terms. Accuracy is a defined concept and
although by convention it may be roughly expressed by using the number
of significant figures, that is not the same concept. Without
additional information you cannot infer backwards whether usage of
significant figures expresses accuracy or precision. See
http://en.wikipedia.org/wiki/Accuracy_and_precision

 Ok, there's some terminology confusion here. I'm using accuracy to refer to
 the accuracy of measurement (e.g. standard deviation), and precision to 
 refer
 to the precision of presentation (e.g. significant digits). We need these two
 things at least, and words for them. I don't care much which words we use.

I do. And I think it is important for WIkidata to precisely express
what it wants to achieve.

Accuracy has nothing to do with s.d., which is a measure of
dispersion. You can have an accuracy of +/- 10 measured with a
precision of +/- 0.1 (and a standard deviation for the population of
objects that you have measured of 2).


-

 So 4.10
 means that the last digit is significant, i.e. the best estimate is at
 least between 4.095 and 4.105 (but it may be better). . 4.10 +/- 0.005
 means it is precisely 4.095 and 4.105, as opposed to 4.10 +/- 0.004,
 4.10 +/- 0.003,  4.10 +/- 0.002 etc.

Yes, all this should be handled by the component responsible for parsing user
input for quantity values.

But it cannot be because you have lost the information. I don't know
whether  +/- 0.005 indicates significant figures/digits or whether is
is an exact precision_or_accuracy interval.

I think this may become clearer if you consider a value entered in inches:

1.20 inches.
you convert:
1.20 +/- 0.05 in = 3.048 10^-2 m +/- 1.27 10^-3 m

If this is the only information stored, I have no information left
whether I should display 3.048 or 3.0480 and whether the information
+/- 1.27 10^-3 m is meaningful (no) or an artifact of conversion
(yes).




 It can be stored as an auxilliary data point, that is, as a qualifier 
 (measured
 in feet). It should not IMHO be part of the data value as such, because that
 would make it extremely hard to use the values in a database.

You are correct insofar that I propose you need to store two units:
the normalized one (SI units only, and no prefix - and even though the
SI base unit is kg I would store gram) and the original one plus the
original unit prefix.

If you do that, you can store the value in a single normalized unit,
provided you back-convert it prior to display in Wikidata.

I don't think the original unit is a meaningless qualifier, it is
vital information for context.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 17:03, Daniel Kinzler daniel.kinz...@wikimedia.de wrote:
 I'd have thought that we'd have one such table per dimension (such as length
 or weight). It may make sense to override that on a per-property basis, so
 2300m elevation isn't shown as 2.3km. Or that can be done in the template that
 renders the value.

here and in the entire discussion I fear that the need to support data
curation on Wikidata data for correctness is not sufficiently in the
focus.

If someone enters the height of a mountain in feet and I see the
converted value in meter in my wikidata preferences-converted view, I
will correct the seemingly senseless and unjustified precision to
three digits after the meter. Only if we understand in which unit the
data were originally valid, we will be able to successfully
communicate and collaborate.

Yes, Wikidata shall store a normalized version of the value, but it
also needs to store an original one. Whether it needs to store the
value twice I am not sure, I believe not. If it store the original
prefix, original unit and original significant digits, it can
generally recreate the original form. I know that there are some
pitfalls with IEEE numbers in this, and it may be safer to store the
original number as well initially (and perhaps drop it later when
enough data are available to test the effects).

Of course, Wikipedias can use the API to display the value in any
other form, just as they like, but that does not solve the problem of
data curation on wikidata (which includes the data curation by
wikipedia authors).

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Martynas Jusevičius
Denny,

you're sidestepping the main issue here -- every sensible architecture
should build on as much previous standards as possible, and build own
custom solution only if a *very* compelling reason is found to do so
instead of finding a compromise between the requirements and the
standard. Wikidata seems to be constantly doing the opposite --
building a custom solution with whatever reason, or even without it.
This drives the compatibility and reuse towards zero.

This thread originally discussed datatypes for values such as numbers,
dates and their intervals -- semantics for all of those are defined in
XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/
All the XML and RDF tools are compatible with XSD, however I don't
think there is even a single mention of it in this thread? What makes
Wikidata so special that its datatypes cannot build on XSD? And this
is only one of the issues, I've pointed out others earlier.

Martynas
graphity.org


On Wed, Dec 19, 2012 at 5:58 PM, Denny Vrandečić
denny.vrande...@wikimedia.de wrote:
 Martynas,

 could you please let me know where RDF or any of the W3C standards covers
 topics like units, uncertainty, and their conversion. I would be very much
 interested in that.

 Cheers,
 Denny




 2012/12/19 Martynas Jusevičius marty...@graphity.org

 Hey wikidatians,

 occasionally checking threads in this list like the current one, I get
 a mixed feeling: on one hand, it is sad to see the efforts and
 resources waisted as Wikidata tries to reinvent RDF, and now also
 triplestore design as well as XSD datatypes. What's next, WikiQL
 instead of SPARQL?

 On the other hand, it feels reassuring as I was right to predict this:
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html

 Best,

 Martynas
 graphity.org

 On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
 daniel.kinz...@wikimedia.de wrote:
  On 19.12.2012 14:34, Friedrich Röhrs wrote:
  Hi,
 
  Sorry for my ignorance, if this is common knowledge: What is the use
  case for
  sorting millions of different measures from different objects?
 
  Finding all cities with more than 10 inhabitants requires the
  database to
  look through all values for the property population (or even all
  properties
  with countable values, depending on implementation an query planning),
  compare
  each value with 10 and return those with a greater value. To speed
  this
  up, an index sorted by this value would be needed.
 
  For cars there could be entries by the manufacturer, by some
  car-testing magazine, etc. I don't see how this could be adequatly
  represented/sorted by a database only query.
 
  If this cannot be done adequatly on the database level, then it cannot
  be done
  efficiently, which means we will not allow it. So our task is to come up
  with an
  architecture that does allow this.
 
  (One way to allow scripted queries like this to run efficiently is to
  do this
  in a massively parallel way, using a map/reduce framework. But that's
  also not
  trivial, and would require a whole new server infrastructure).
 
  If however this is necessary, i still don't understand why it must
  affect the
  datavalue structure. If a index is necessary it could be done over a
  serialized
  representation of the value.
 
  Serialized can mean a lot of things, but an index on some data blob is
  only
  useful for exact matches, it can not be used for greater/lesser queries.
  We need
  to map our values to scalar data types the database can understand
  directly, and
  use for indexing.
 
  This needs to be done anyway, since the values are
  saved at a specific unit (which is just a wikidata item). To compare
  them on a
  database level they must all be saved at the same unit, or some sort of
  procedure must be used to compare them (or am i missing something
  again?).
 
  If they measure the same dimension, they should be saved using the same
  unit
  (probably the SI base unit for that dimension). Saving values using
  different
  units would make it impossible to run efficient queries against these
  values,
  thereby defying one of the major reasons for Wikidata's existance. I
  don't see a
  way around this.
 
  -- daniel
 
  --
  Daniel Kinzler, Softwarearchitekt
  Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
 
 
  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l




 --
 Project director Wikidata
 Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
 Tel. +49-30-219 158 26-0 | http://wikimedia.de

 Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
 Eingetragen im Vereinsregister des 

Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard
My philosophy is this: We should do whatever works best for Wikidata and
Wikidata's needs. If people want to reuse our content, and the choices
we've made make existing tools unworkable, they can build new tools
themselves. We should not be clinging to what's been done already if it
gets in the way of what will make Wikidata better. Everything that we
make and do is open, including the software we're going to operate the
database on. Every WMF project has done things differently from the
standards of the time, and people have developed tools to use our content
before. Wikidata will be no different in that regard.

Sven

On Wed, Dec 19, 2012 at 12:27 PM, Martynas Jusevičius marty...@graphity.org
 wrote:

 Denny,

 you're sidestepping the main issue here -- every sensible architecture
 should build on as much previous standards as possible, and build own
 custom solution only if a *very* compelling reason is found to do so
 instead of finding a compromise between the requirements and the
 standard. Wikidata seems to be constantly doing the opposite --
 building a custom solution with whatever reason, or even without it.
 This drives the compatibility and reuse towards zero.

 This thread originally discussed datatypes for values such as numbers,
 dates and their intervals -- semantics for all of those are defined in
 XML Schema Datatypes: http://www.w3.org/TR/xmlschema-2/
 All the XML and RDF tools are compatible with XSD, however I don't
 think there is even a single mention of it in this thread? What makes
 Wikidata so special that its datatypes cannot build on XSD? And this
 is only one of the issues, I've pointed out others earlier.

 Martynas
 graphity.org


 On Wed, Dec 19, 2012 at 5:58 PM, Denny Vrandečić
 denny.vrande...@wikimedia.de wrote:
  Martynas,
 
  could you please let me know where RDF or any of the W3C standards covers
  topics like units, uncertainty, and their conversion. I would be very
 much
  interested in that.
 
  Cheers,
  Denny
 
 
 
 
  2012/12/19 Martynas Jusevičius marty...@graphity.org
 
  Hey wikidatians,
 
  occasionally checking threads in this list like the current one, I get
  a mixed feeling: on one hand, it is sad to see the efforts and
  resources waisted as Wikidata tries to reinvent RDF, and now also
  triplestore design as well as XSD datatypes. What's next, WikiQL
  instead of SPARQL?
 
  On the other hand, it feels reassuring as I was right to predict this:
 
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
 
 http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html
 
  Best,
 
  Martynas
  graphity.org
 
  On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
  daniel.kinz...@wikimedia.de wrote:
   On 19.12.2012 14:34, Friedrich Röhrs wrote:
   Hi,
  
   Sorry for my ignorance, if this is common knowledge: What is the use
   case for
   sorting millions of different measures from different objects?
  
   Finding all cities with more than 10 inhabitants requires the
   database to
   look through all values for the property population (or even all
   properties
   with countable values, depending on implementation an query planning),
   compare
   each value with 10 and return those with a greater value. To
 speed
   this
   up, an index sorted by this value would be needed.
  
   For cars there could be entries by the manufacturer, by some
   car-testing magazine, etc. I don't see how this could be adequatly
   represented/sorted by a database only query.
  
   If this cannot be done adequatly on the database level, then it cannot
   be done
   efficiently, which means we will not allow it. So our task is to come
 up
   with an
   architecture that does allow this.
  
   (One way to allow scripted queries like this to run efficiently is
 to
   do this
   in a massively parallel way, using a map/reduce framework. But that's
   also not
   trivial, and would require a whole new server infrastructure).
  
   If however this is necessary, i still don't understand why it must
   affect the
   datavalue structure. If a index is necessary it could be done over a
   serialized
   representation of the value.
  
   Serialized can mean a lot of things, but an index on some data blob
 is
   only
   useful for exact matches, it can not be used for greater/lesser
 queries.
   We need
   to map our values to scalar data types the database can understand
   directly, and
   use for indexing.
  
   This needs to be done anyway, since the values are
   saved at a specific unit (which is just a wikidata item). To compare
   them on a
   database level they must all be saved at the same unit, or some sort
 of
   procedure must be used to compare them (or am i missing something
   again?).
  
   If they measure the same dimension, they should be saved using the
 same
   unit
   (probably the SI base unit for that dimension). Saving values using
   different
   units would make it impossible to run efficient queries against these
   values,
 

Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
Martynas,

I think you misinterpret the thread. There is no discussion not to
build on the datatypes defined in http://www.w3.org/TR/xmlschema-2/

What we are doing is discussing compositions of elements, all typed to
xml datatypes, that shall be able to express scientific and
engineering requirements as to statistics, signficant digits (except
perhaps for duration, none of the data types in
http://www.w3.org/TR/xmlschema-2/ supports that), as well as means to
express uncertainty and confidence intervals.

Many existing xml schemata define such compositions, all squarely
built on http://www.w3.org/TR/xmlschema-2/ - wikidata is certainly not
unique in this effort. If you can point the team to further well
reviewed solutions, this would be very useful.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
 

I suspect what Martynas is driving at is that XMLS defines
**FACETS** for its datatypes - accepting those as a baseline, and then
extending them to your requirements, is a reasonable, community-oriented
procss. However, wrapping oneself in the flag of open development is
to me unresponsive to a simple plea to stand on the shoulders of giants
gone before, to act in a responsible manner cognizant of the interests
of the broader community. 

And personally I have to say I don't like
the word clinging -- clearly a red flag meant to inflame if not
insult. This is no place for that! 

On 19.12.2012 09:47, Sven Manguard
wrote: 

 My philosophy is this: We should do whatever works best for
Wikidata and Wikidata's needs. If people want to reuse our content, and
the choices we've made make existing tools unworkable, they can build
new tools themselves. We should not be clinging to what's been done
already if it gets in the way of what will make Wikidata better.
Everything that we make and do is open, including the software we're
going to operate the database on. Every WMF project has done things
differently from the standards of the time, and people have developed
tools to use our content before. Wikidata will be no different in that
regard.
 
 Sven
 
 On Wed, Dec 19, 2012 at 12:27 PM, Martynas
Jusevičius marty...@graphity.org wrote:
 
 Denny,
 
 you're
sidestepping the main issue here -- every sensible architecture

should build on as much previous standards as possible, and build own

custom solution only if a *very* compelling reason is found to do so

instead of finding a compromise between the requirements and the

standard. Wikidata seems to be constantly doing the opposite --

building a custom solution with whatever reason, or even without it.

This drives the compatibility and reuse towards zero.
 
 This thread
originally discussed datatypes for values such as numbers,
 dates and
their intervals -- semantics for all of those are defined in
 XML
Schema Datatypes: http://www.w3.org/TR/xmlschema-2/ [1]
 All the XML
and RDF tools are compatible with XSD, however I don't
 think there is
even a single mention of it in this thread? What makes
 Wikidata so
special that its datatypes cannot build on XSD? And this
 is only one
of the issues, I've pointed out others earlier.
 
 Martynas

graphity.org [2]
 
 On Wed, Dec 19, 2012 at 5:58 PM, Denny
Vrandečić
 denny.vrande...@wikimedia.de wrote:
  Martynas,


  could you please let me know where RDF or any of the W3C
standards covers
  topics like units, uncertainty, and their
conversion. I would be very much
  interested in that.
 
 
Cheers,
  Denny
 
 
 
 
  2012/12/19 Martynas
Jusevičius marty...@graphity.org
 
  Hey wikidatians,
 

 occasionally checking threads in this list like the current one, I
get
  a mixed feeling: on one hand, it is sad to see the efforts
and
  resources waisted as Wikidata tries to reinvent RDF, and now
also
  triplestore design as well as XSD datatypes. What's next,
WikiQL
  instead of SPARQL?
 
  On the other hand, it feels
reassuring as I was right to predict this:
 
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
[3]
 
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html
[4]
 
  Best,
 
  Martynas
  graphity.org [2]


  On Wed, Dec 19, 2012 at 4:11 PM, Daniel Kinzler
 
daniel.kinz...@wikimedia.de wrote:
   On 19.12.2012 14:34,
Friedrich Röhrs wrote:
   Hi,
  
   Sorry for my
ignorance, if this is common knowledge: What is the use
   case
for
   sorting millions of different measures from different
objects?
  
   Finding all cities with more than 10
inhabitants requires the
   database to
   look through all
values for the property population (or even all
   properties

  with countable values, depending on implementation an query
planning),
   compare
   each value with 10 and return
those with a greater value. To speed
   this
   up, an index
sorted by this value would be needed.
  
   For cars there
could be entries by the manufacturer, by some
   car-testing
magazine, etc. I don't see how this could be adequatly
  
represented/sorted by a database only query.
  
   If this
cannot be done adequatly on the database level, then it cannot
  
be done
   efficiently, which means we will not allow it. So our
task is to come up
   with an
   architecture that does allow
this.
  
   (One way to allow scripted queries like this to
run efficiently is to
   do this
   in a massively parallel
way, using a map/reduce framework. But that's
   also not
  
trivial, and would require a whole new server infrastructure).
 

   If however this is necessary, i still don't understand why it
must
   affect the
   datavalue structure. If a index is
necessary it could be done over a
   serialized
  
representation of the value.
  
   Serialized can mean a lot
of things, but an index on some data blob is
   only
   useful
for exact matches, it can not be used for greater/lesser queries.
 
 We need
   to map 

Re: [Wikidata-l] Data values

2012-12-19 Thread Tom Morris
Wow, what a long thread.  I was just about to chime in to agree with Sven's
point about units when he interjected his comment about blithely ignoring
history, so I feel compelled to comment on that first.  It's fine to ignore
standards *for good reasons*, but doing it out of ignorance or gratuitously
is just silly.  Thinking that WMF is so special it can create a better
solution without even know what others have done before is the height of
arrogance.

Modeling time and units can basically be made arbitrary complex, so the
trick is in achieving the right balance of complexity vs utility.  Time is
complex enough that I think it deserves it's own thread.  The first thing
I'd do is establish some definitions to cover some basics like
durations/intervals, uncertain dates, unknown dates, imprecise dates, etc
to that everyone is using the same terminology and concepts.  Much of the
time discussion is difficult for me to follow because I have to guess at
what people mean.  In addition to the ability to handle circa/about dates
already mentioned, it's also useful to be able to represent before/after
dates e.g. he died before 1 Dec 1792 when his will was probated.  Long term
I suspect you'll need support for additional calendars rather than
converting everything to a common calendar, but only supporting Gregorian
is a good way to limit complexity to start with.  Geologic times may
(probably?) need to be modeled differently.

Although I disagree strongly with Sven's sentiments about the
appropriateness of reinventing things, I believe he's right about the need
to support more units than just SI units and to know what units were used
in the original measurement.  It's not just a matter of aesthetics but of
being able to preserve the provenance.  Perhaps this gets saved for a
future iteration, but you may find that you need both display and
computable versions of things stored separately.

Speaking of computable versions don't underestimate the issues with using
floating points numbers.  There are numbers that they just can't represent
and their range is not infinite.

Historians and genealogists have many interminable discussions about
date/time representation which can be found in various list archives, but
one recent spec worth reviewing is Extended Date/Time Format (EDTF)
http://www.loc.gov/standards/datetime/pre-submission.html

Another thing worth looking at is the Freebase schema since it not only
represents a bunch of this stuff already, but it's got real world data
stored in the schema and user interface implementations for input and
rendering (although many of the latter could be improved).  In particular,
some of the following might be of interest:

http://www.freebase.com/view/measurement_unit /
http://www.freebase.com/schema/measurement_unit
http://www.freebase.com/schema/time
http://www.freebase.com/schema/astronomy/celestial_object_age
http://www.freebase.com/schema/time/geologic_time_period
http://www.freebase.com/schema/time/geologic_time_period_uncertainty

If you rummage around, you can probably find lots of interesting examples
and decide for yourself whether or not that's a good way to model things.
 I'm reasonably familiar with the schema and happy to answer questions.

There are probably lots of other example vocabularlies that one could
review such as the Pleiades project's: http://pleiades.stoa.org/vocabularies

You're not going to get it right the first time, so I would just start with
a small core that you're reasonably confident in and iterate from there.

Tom

On Wed, Dec 19, 2012 at 12:47 PM, Sven Manguard svenmangu...@gmail.comwrote:

 My philosophy is this: We should do whatever works best for Wikidata and
 Wikidata's needs. If people want to reuse our content, and the choices
 we've made make existing tools unworkable, they can build new tools
 themselves. We should not be clinging to what's been done already if it
 gets in the way of what will make Wikidata better. Everything that we
 make and do is open, including the software we're going to operate the
 database on. Every WMF project has done things differently from the
 standards of the time, and people have developed tools to use our content
 before. Wikidata will be no different in that regard.

 Sven


 On Wed, Dec 19, 2012 at 12:27 PM, Martynas Jusevičius 
 marty...@graphity.org wrote:

 Denny,

 you're sidestepping the main issue here -- every sensible architecture
 should build on as much previous standards as possible, and build own
 custom solution only if a *very* compelling reason is found to do so
 instead of finding a compromise between the requirements and the
 standard. Wikidata seems to be constantly doing the opposite --
 building a custom solution with whatever reason, or even without it.
 This drives the compatibility and reuse towards zero.

 This thread originally discussed datatypes for values such as numbers,
 dates and their intervals -- semantics for all of those are defined in
 XML Schema Datatypes: 

Re: [Wikidata-l] Reusing Languages Translation (was: Data values)

2012-12-19 Thread swuensch
It would be much more easier if this could be done automatically, so
everybody could set there preferred data system SI or CGS or what ever.

Sk!d
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] .name = text property

2012-12-19 Thread jmcclure
 

Using the dotted notation, XSD datatype facets such as below can be
specified easily as properties using a simple colon: 

 Property:
.anyType:equal - (sameAs equivaluent) redirect to page/object with
actual numeric value
 Property: .anyType:ordered - a boolean property

Property: .anyType:bounded - a boolean property
 Property:
.anyType:cardinality - a boolean property
 Property: .anyType:numeric -
a boolean property
 Property: .anyType:length - number of chars allowed
for value
 Property: .anyType:minLength - min nbr of chars for value

Property: .anyType:maxLength - max nbr of chars for value
 Property:
.anyType:pattern - regex string
 Property: .anyType:enumeration -
specified values comprising value space
 Property: .anyType:whiteSpace -
reserve or replace or collapse
 Property: .anyType:maxExclusive - number
for an upper bound
 Property: .anyType:maxInclusive - number for an
upper bound
 Property: .anyType:minExclusive - number for an lower
bound
 Property: .anyType:minInclusive - number for an lower bound

Property: .anyType:totalDigits - number of total digits
 Property:
.anyType:fractionDigits - number of digits in the fractional part of a
number 

An anonymous object is used to represent namespace-qualified
(text  url) values eg_ rdf:about_: 

 Property: .:rdf:about - this is a
.url value for an RDF about property for a page/object
 Property:
.:skos:prefLabel - this is a .name value for a page/object 

I suggest
that properties for precision can be found in XSD facets above.
- john


On 19.12.2012 12:41, jmccl...@hypergrove.com wrote: 

 Here's a
suggestion. Property names for numeric information seem to be on the
table -- these should be viewed systematically not haphazardly. 
 
 If
all text properties had a dotted lower-case name, life would be
simpler in SMW land all around and maybe Wikidata land too. All page
names have an initial capital as a consequence of requiring all text
properties to be named with an initial period followed by a lower-case
letter. The SMW tool mandates the properties from which all derive:
.text, .string and .number are basic (along with others like .page).
Then, strings have language-based subproperties and number expression
subproperties, and numbers have XSD datatype subpropertiess, which in
turn have SI unit type subproperties, and so on. 
 
 Here's a
Consolidated Listing of ISO 639, ISO 4217, SI Measurement Symbols, and
World Time Zones [2] [1] to illustrate that it is possible to create a
unified string-  numeric-type property name dictionary across a wide
swath of the standards world. The document lists a few overlapping
symbols then re-assigned to another symbol. 
 
 Adopting a dotted
name text-property naming convention, can segue to easier user
interfaces too for query forms at least plus impacts exploited by an SMW
query engine. What is meant by these expressions seems pretty natural to
most people: 
 
 Property: Height - the value is a wiki pagename or
objectname for a height numeric object
 Property: .text - (on Height)
the value is text markup associated with the Height object
 Property:
.string - (on Height) the value is text non-markup data for the Height
object
 Property: .ft - (on Height) the value is number of feet
associated with the Height object
 Property: Height.text - the value is
text markup associated with an anonymous Height object
 Property:
Height.string - the value is a string property of an anonymous Height
object
 Property: Height.ft - the value is a feet property of an
anonymous Height object
 
 [1]
http://www.hypergrove.com/Publications/Symbols.html 
 

___
 Wikidata-l mailing
list
 Wikidata-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1]




Links:
--
[1]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[2]
http://www.hypergrove.com/Publications/Symbols.html
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Gregor Hagedorn
On 19 December 2012 20:01,  jmccl...@hypergrove.com wrote:
 Hi Gregor - the root of the misconception I likely have about significant
 digits and the like, is that such is one example of a rendering parameter
 not a semantic property.

It is about semantics, not formatting.

In science and engineering, the number of significant digits is not
used to right align numbers, but to semantically indicate the order of
magnitude of the accuracy and/or precision of a measurement or
quantity. Thus, the weight of a machine can be given as 1.2 t (exact
to +/- 50 kg), 1200 kg  (+/- 1 kg), or 1200.000 g.

This is not part of IEEE floating point numbers, which always have the
type dependent same precision or number of significant digits,
regardless whether this is semantically justified or not. IEEE 754
standard double always has about 16 decimal significant digits, i.e.
the value 1.2 tons will always be given as 1.200 tons.
This is good for calculations, but lacks the information for final
rounding.

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
 

totally agree - hopefully XSD facets provide a solid start to
meeting those concrete requrements - thanks. 

On 19.12.2012 14:09,
Gregor Hagedorn wrote: 

 On 19 December 2012 20:01,
jmccl...@hypergrove.com wrote:
 
 Hi Gregor - the root of the
misconception I likely have about significant digits and the like, is
that such is one example of a rendering parameter not a semantic
property.
 
 It is about semantics, not formatting.
 
 In science
and engineering, the number of significant digits is not
 used to right
align numbers, but to semantically indicate the order of
 magnitude of
the accuracy and/or precision of a measurement or
 quantity. Thus, the
weight of a machine can be given as 1.2 t (exact
 to +/- 50 kg), 1200
kg (+/- 1 kg), or 1200.000 g.
 
 This is not part of IEEE floating
point numbers, which always have the
 type dependent same precision or
number of significant digits,
 regardless whether this is semantically
justified or not. IEEE 754
 standard double always has about 16 decimal
significant digits, i.e.
 the value 1.2 tons will always be given as
1.200 tons.
 This is good for calculations, but lacks the
information for final
 rounding.
 
 Gregor
 

___
 Wikidata-l mailing
list
 Wikidata-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1]




Links:
--
[1]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread jmcclure
 

For me the question is how to name the precision information. Do not
the XSD facets totalDigits and fractionDigits work well enough? I
mean 

 .number:totalDigits contains a positive power of ten for
precision
 .number:fractionDigits contains a negative power of ten for
precision 

The use of the word datatype is always interesting as
somehow it's meant organically different from the measurement to which
it's related. Both are resources with named properties - what are those
names? Certain property names derived from (international standards)
should be considered builtin to whatever foundation the implementing
tool procides. I suggest that XSD names be used at least for concepts
that appear to be the same, with or without the xsd: xml-namespace
prefix. 

But the word datatype fascinates me even more ever since SMW
internalized the Datatype namespace. Because to me RDF made an error
back when the rdf:type property got the range Class, when it should have
been Datatype (though politics got in the way!) It gets more twisted, as
now Category is the chosen implementation of rdfs:Class. The problem
that presents is that categories are lists and a class (that is,
rdf:type value) is, for some singular, and for others a plural, concept
or label. Pure semantic mayhem. 

I'm happy SMW internalized the
datatype namespace to the extent it maps to its software chiefly because
it clarifies that a standard Type namespace is needed -- which
contains singular noun phrases -- which is the value range for rdf:type
(if you will) properties. All Measurement types (eg Feet, Height 
Lumens) would be represented there too, like any other class, with its
associated properties that (in the case of numerics) would include
.totalDigits and .fractionDigits. 

Going this route -- establishing
a standard Type namespace -- would allow wikis to have a separate
vocabulary of singular noun phrases not in the Category namespace. The
ultimate goal is to associate a given Type to its implemention as a wiki
namespace, subpage or subobject; the Category namespace itself is
already overloaded to handle that task. 

-john 

On 19.12.2012 14:50,
Gregor Hagedorn wrote: 

 totally agree - hopefully XSD facets provide
a solid start to meeting those concrete requrements
 
 they don't.
They allow to define derived datatypes and thus apply to
 the datatype,
not the measurement. Different measurements of the same
 datatype may
be of different precision. --gregor
 

___
 Wikidata-l mailing
list
 Wikidata-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1]




Links:
--
[1]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Sven Manguard
I think that Tom Morris tragically misunderstood my point, although that
was likely helped by the fact that, as I've insinuated already, the many
standards and acronyms being thrown about are largely lost on me.

My point is not We can just throw everything out because we're big and
awesome and have name brand power. My point was We're going to reach a
point where some of the existing standards and tools just don't work
because when they were built things like Wikidata weren't envisioned. We
need to have the mindset that developing new pieces that work for us is
better than trying to force a square peg into a round hole just because
something is already widely used. If what exists doesn't work, we're going
to do more harm than good if we have to start cutting corners or cutting
features to try and get it to work. We have an infrestructure that would
allow third parties to come along later and build tools that allow there to
be a bridge between whatever we create and whatever exists already.

Sven

On Wed, Dec 19, 2012 at 2:40 PM, Tom Morris tfmor...@gmail.com wrote:

 Wow, what a long thread.  I was just about to chime in to agree with
 Sven's point about units when he interjected his comment about blithely
 ignoring history, so I feel compelled to comment on that first.  It's fine
 to ignore standards *for good reasons*, but doing it out of ignorance or
 gratuitously is just silly.  Thinking that WMF is so special it can create
 a better solution without even know what others have done before is the
 height of arrogance.

 Modeling time and units can basically be made arbitrary complex, so the
 trick is in achieving the right balance of complexity vs utility.  Time is
 complex enough that I think it deserves it's own thread.  The first thing
 I'd do is establish some definitions to cover some basics like
 durations/intervals, uncertain dates, unknown dates, imprecise dates, etc
 to that everyone is using the same terminology and concepts.  Much of the
 time discussion is difficult for me to follow because I have to guess at
 what people mean.  In addition to the ability to handle circa/about dates
 already mentioned, it's also useful to be able to represent before/after
 dates e.g. he died before 1 Dec 1792 when his will was probated.  Long term
 I suspect you'll need support for additional calendars rather than
 converting everything to a common calendar, but only supporting Gregorian
 is a good way to limit complexity to start with.  Geologic times may
 (probably?) need to be modeled differently.

 Although I disagree strongly with Sven's sentiments about the
 appropriateness of reinventing things, I believe he's right about the need
 to support more units than just SI units and to know what units were used
 in the original measurement.  It's not just a matter of aesthetics but of
 being able to preserve the provenance.  Perhaps this gets saved for a
 future iteration, but you may find that you need both display and
 computable versions of things stored separately.

 Speaking of computable versions don't underestimate the issues with using
 floating points numbers.  There are numbers that they just can't represent
 and their range is not infinite.

 Historians and genealogists have many interminable discussions about
 date/time representation which can be found in various list archives, but
 one recent spec worth reviewing is Extended Date/Time Format (EDTF)
 http://www.loc.gov/standards/datetime/pre-submission.html

 Another thing worth looking at is the Freebase schema since it not only
 represents a bunch of this stuff already, but it's got real world data
 stored in the schema and user interface implementations for input and
 rendering (although many of the latter could be improved).  In particular,
 some of the following might be of interest:

 http://www.freebase.com/view/measurement_unit /
 http://www.freebase.com/schema/measurement_unit
 http://www.freebase.com/schema/time
 http://www.freebase.com/schema/astronomy/celestial_object_age
 http://www.freebase.com/schema/time/geologic_time_period
 http://www.freebase.com/schema/time/geologic_time_period_uncertainty

 If you rummage around, you can probably find lots of interesting examples
 and decide for yourself whether or not that's a good way to model things.
  I'm reasonably familiar with the schema and happy to answer questions.

 There are probably lots of other example vocabularlies that one could
 review such as the Pleiades project's:
 http://pleiades.stoa.org/vocabularies

 You're not going to get it right the first time, so I would just start
 with a small core that you're reasonably confident in and iterate from
 there.

 Tom

 On Wed, Dec 19, 2012 at 12:47 PM, Sven Manguard svenmangu...@gmail.comwrote:

 My philosophy is this: We should do whatever works best for Wikidata and
 Wikidata's needs. If people want to reuse our content, and the choices
 we've made make existing tools unworkable, they can build new tools
 themselves. We 

[Wikidata-l] qudt ontology facets

2012-12-19 Thread jmcclure
 

The NIST ontology defines 4 basic classes that are great: 

_qudt:QuantityKind [11]_, _qudt:Quantity [12]_, _qudt:QuantityValue
[13]_, _qudt:Unit [14]_ 

but the properties set leaves me a bit
thirsty. Take Area as an example. I'd like to reference properties
named .ft2 and .m2 so that, for instance, an annotation might be
[[Leasable area.ft2::12345]]. To state the precision applicable to that
measurement, might be [[Leasable area.ft2:fractionDigits :: 0]] to
indicate say, rounding. However, in the NIST ontology, there is no ft2
property at all -- this is an SI unit though, so it seems identifying
first the system of measurement units, and then the specific measurement
unit is not a great idea because these notations are then divorced from
the property name itself, a scenario guaranteed to cause more user
errors  omissions I think. 

Someone's mentioned uncertainty facets, so
I suggest these from the qudt ontology: 

 Property:
.anyType:relativeStandardUncertainty
 Property:
.anyType:standardUncertainty 

Other facets noted might include 


Property: .anyType:abbreviation
 Property: .anyType:description

Property: .anyType:symbol 

-john 

On 19.12.2012 08:10, Herman
Bruyninckx wrote: 

 On Wed, 19 Dec 2012, Denny Vrandečić wrote:
 

Martynas, could you please let me know where RDF or any of the W3C
standards covers topics like units, uncertainty, and their conversion. I
would be very much interested in that.
 
 NIST has created a standard
in OWL: QUDT - Quantities, Units, Dimensions and Data Types in OWL and
XML:
 http://www.qudt.org/qudt/owl/1.0.0/index.html [5]
 
 I fully
share Martynas' concerns: most of the problems that are being

discussed in this thread (and that are very relevant and interesting)

should not be solved with an object oriented approach (that is, via

properties of objects, and inheritance) but by semantic modelling
(that
 is, composition of knowledge). For example, one single data
base
 representation of a unit can have multiple displays depending
on who
 wants to see the unit, and in which context; the viewer and the
context are
 rather simple to add via semantic primitives. For example,
the Topic Map
 semantic standard would fit here very well, in my
opinion:
 http://en.wikipedia.org/wiki/Topic_map [6].
 
 Cheers,
Denny
 
 Herman
 http://people.mech.kuleuven.be/~bruyninc Tel: +32
16 328056 Vice-President Research euRobotics http://www.eu-robotics.net
[7] Open RObot COntrol Software http://www.orocos.org [8] Associate
Editor JOSER http://www.joser.org [9], IJRR http://www.ijrr.org [10]

 
 ___
 Wikidata-l
mailing list
 Wikidata-l@lists.wikimedia.org

https://lists.wikimedia.org/mailman/listinfo/wikidata-l [1]




Links:
--
[1]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
[2]
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00056.html
[3]
http://www.mail-archive.com/wikidata-l@lists.wikimedia.org/msg00750.html
[4]
http://wikimedia.de
[5]
http://www.qudt.org/qudt/owl/1.0.0/index.html
[6]
http://en.wikipedia.org/wiki/Topic_map
[7]
http://www.eu-robotics.net
[8] http://www.orocos.org
[9]
http://www.joser.org
[10] http://www.ijrr.org
[11]
http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#QuantityKind
[12]
http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#Quantity
[13]
http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#QuantityValue
[14]
http://www.qudt.org/qudt/owl/1.0.0/qudt/index.html#Unit
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Wikimania videos of Wikidata sessions

2012-12-19 Thread Katie Filbert
Finally, we have the rest of the Wikimania videos available, including this
one of the Wikidata panel in the sister projects session:

http://www.youtube.com/watch?v=xi8Yf9c3wXg (starts at 22:45)

The other Wikidata session is here:

http://www.youtube.com/watch?v=05HxNwxiNZ0

Cheers,
Katie

-- 
Katie Filbert
Wikidata Developer

Wikimedia Germany e.V. | NEW: Obentrautstr. 72 | 10963 Berlin
Phone (030) 219 158 26-0

http://wikimedia.de

Wikimedia Germany - Society for the Promotion of free knowledge eV Entered
in the register of Amtsgericht Berlin-Charlottenburg under the number 23
855 as recognized as charitable by the Inland Revenue for corporations I
Berlin, tax number 27/681/51985.
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Data values

2012-12-19 Thread Peter Jacobi
If one has time to read prior art, I'd suggest giving the Health Level
7 v3.0 Data Types Specification
http://amisha.pragmaticdata.com/v3dt/report.html a look.

Of course HL7 has a lot of things to worry about which are off topic
for us, starting with a prior completely different version of the
standard. And much emphasis goes to coded values (enums) and coding
systems, but it is a nice review of issues found and solved by many
eyeballs and years.


Peter

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l