David,
On 28/05/14 16:35, David Cuenca wrote:
Markus,
Ok, now I understand that "same as" wouldn't be a good name for the
confusion it would cause. However the property "subject of" as it is now
wouldn't be a good candidate either. Its meaning is that a certain
statement is represented by another item (that is why it is only allowed
to be used as qualifier).
Ok.
Perhaps a better name would be "corresponds with item" and the inverse
"corresponds with property". Just by having these connections, a lot of
information can be inferred from the connected item.
Consider the following example with "occupation (P106)", and "occupation
(Q13516667)":
- I cannot find any clear "subproperty of" for p106, but there is a
clear "subclass of:human behaviour" for the item
- "human behaviour" is "part of" human
I don't understand this use of "part of". Maybe I would say "having an
occupation is part of being human" but not that "occupation is part of
human". I would not use either of these and restrict "part of" to clear,
undisputed statements like "the steering wheel is part of the car".
Otherwise, anything could be part of human ("head"?, "sadness"?,
"singing"?, "birth"? -- entering this in Wikidata would not lead anywhere).
"Part of" is quite problematic in general. You can see it from the
discussion on its property page, and also from the uses it sees in the
wiki, that this property is severely misunderstood and/or misused. At
the very least, one should distinguish "physical part of" from "meronym"
(both are aliases of the property now!). And then one should realise
that meronyms are in the domain of Wiktionary, which we cannot capture
in Wikidata properly since we do not have items for words but for
concepts. One alias for an item might be a meronym of something else,
while another alias for the same item is not. Using statements for
linguistic properties in Wikidata will not be successful. I am not
saying that Wikibase is not able to capture some ideas of a thesaurus
(we have actually discussed this), but this is not how it is used in
Wikidata.
- "human" can have a statement "intrinsic property" (property proposal
still under discussion) with values "birthday (Q47223)" and an
"(eventual) date of death". It can be expanded in the future to include
newly created properties like "height", "weight", "eye color", etc
Yes, this again makes sense to me. It is basically a variant of the
constraint "Item" which allows you to say that items that are instance
of human should also have a birthday. But again, this is schematic
information (like constraints) and it should not be mixed up with actual
data. It is the same conceptual difference that I have explained for
properties vs. items earlier. Moreover, I think this information (even
if correct in some sense) has very little utility as a piece of
information about an item; it is much more useful for constraints about
properties (which are not items).
- birthday (Q47223) <corresponds with property> date of birth (P569)
It should be the other way around: the correspondence says something
about P569, not about Q47223. There cannot be any reference for this. It
should therefore be a claim on the page of P569 rather than a statement
on the page of Q47223.
Out of this I reach the following conclusions:
- the taxonomy of properties is going to be weak, since there is not
always a clear subpropertyOf unless created artificially (more work)
I agree.
- the standard taxonomy of items (subclass of/part of) is sufficient
to automatically reach meaningful constraints and inference (less work)
I agree that the taxonomy will be helpful in constraints. This is what
constraints already do when using instance of/subclass of. However, I do
not agree that the constraints can or should be stated as part of this
taxonomy. Constraints are too complex, and they are conceptually
different (they say how a property should be used, not how something in
the Real World relates to something else). Constraints interact nicely
with the taxonomy and help to get useful conclusions, but they are not
"part of" taxonomy ;-). We must keep content organisation separate from
content.
- by adding manually the constraints to the property itself we are
duplicating information which will require volunteer effort to maintain
(more work)
I disagree. Constraints refer to the property, not to the Wikidata item,
and it would be conceptually wrong to mix these things up. We already
have agreed that properties and items need to remain distinct for
technical reasons. Once this is clear, there is no reason to move
information that refers to properties (constraints) to item pages. This
will not be a duplication of information: it is enough to have the
constraints on the property pages only. If you look at the constraints
we have, you can see many examples that are specific to Wikidata and
certainly not a general thing about the concept (take the "allowed
values" for "sex or gender"). We really want to keep editorial helpers
(constraints) distinct from sourced information (statements about items).
My recommendation is to rely mainly on the main taxonomy instead of
creating a parallel property taxonomy, and then think of ways to extract
information from the main taxonomy to convert it automatically into
constraints.
All the maintenance takes effort, so the more it can be automated, the
more efficient volunteers will be. And if we can simplify the
maintenance of properties, we will be able to simplify the creation of
properties too, specially when we face the next surge which will come
with the datatype "number with units".
I agree with the general goals, but I don't think that things become any
easier if we confuse information about properties with information about
items. We can still re-use information we have about items (like the
class hierarchy that we already use in constraints) to avoid
duplication, but some things are clearly not part of the item taxonomy.
Cheers,
Markus
On Wed, May 28, 2014 at 2:48 PM, Markus Krötzsch
<mar...@semantic-mediawiki.org <mailto:mar...@semantic-mediawiki.org>>
wrote:
David,
Regarding the question of how to classify properties and how to
relate them to items:
* "same as" (in the sense of owl:sameAs) is not the right concept
here. In fact, it has often been discouraged to use this on the Web,
since it has very strong implications: it means that in all uses of
the one identifier, one could just as well use the other identifier,
and that it is indistinguishable if something has been said about
the one or the other. That seems too strong here, at least for most
cases.
* In the world of OWL DL, sameAs specifically refers to individuals,
not to classes or properties. Saying "P sameAs Q" does not imply
that P and Q have the same extension as properties. For the latter,
OWL has the relationship owl:equivalentProperties. This distinction
of instance level and schema level is similar to the distinction we
have between "instance of" and "subclass of".
* Therefore, I would suggest to use a property called "subproperty
of" as one way of relating properties (analogously to "subclass
of"). It has to be checked if this actually occurs in Wikidata (do
we have any properties that would be in this relation, or do we make
it a modelling principle to have only the most specific properties
in Wikidata?).
* The relationship from properties to items could be modelled with
the existing property "subject of" (P805).
* It might be useful to also have a taxonomic classification of
properties. For example, we already group properties into properties
for "people", "organisations", etc. Such information could also be
added with a specific property (this would be a bit more like a
"category" system on property pages). On the other hand, some of
this might coincide with constraint information that could be
expressed as claims. For instance, person properties might be those
with "Type" (i.e., "rdfs:domain") constraint human. By the way, our
constraint system could use some systematisation -- there are many
overlaps in what you can do with one constraint or another.
Cheers,
Markus
On 28/05/14 12:14, David Cuenca wrote:
Markus,
The explanation about the implications of renaming/deleting
makes most
sense and just that justifies already the separation in two.
It is equally true that when we create a property, we might have
"cleaned" the original concept so much that it might differ (even
slightly) with the understood concept that the item represents.
However,
even after that process, the "new" concept is still an item...
The process of imbuing a concept with permanent characteristics
(adding
a datatype) and the practical approach, also seems to recommend
keeping
items and properties separate.
Thanks for showing me that reasoning :)
I am still wondering about how are we going to classify properties.
Maybe it will require a broader discussion, but if they are the
same (or
mostly the same) as items, then we can just link them as "same
as", and
build the classing structure just for the items. OTOH, if they are
different, then we will need to mirror that classification for
properties, which seems quite redundant. Plus adding a new datatype,
"property".
All in all, my conclusion about this is that properties are just
concepts with special qualities that justify the separation in the
software (even if in real life there is no separation).
many thanks for your detailed answer, and sorry if I'm bringing up
already discussed topics. It is just that when you stare long into
wikidata, wikidata stares back into you ;)
Cheers,
Micru
On Wed, May 28, 2014 at 11:39 AM, Markus Krötzsch
<mar...@semantic-mediawiki.org
<mailto:mar...@semantic-mediawiki.org>
<mailto:markus@semantic-__mediawiki.org
<mailto:mar...@semantic-mediawiki.org>>>
wrote:
Hi David,
Interesting remark. Let's explore this idea a bit. I will
give you
two main reasons why we have properties separate, one
practical and
one conceptual.
First the practical point. Certainly, everything that is
used as a
property needs to have a datatype, since otherwise the wiki
would
not know what kind of input UI to show. So you cannot use
just any
item as a property straight away -- it needs to have a datatype
first. So, yes, you could abolish the namespace Property
but you
still would have a clear, crisp distinction between
property items
(those with datatype) and normal items (those without a
datatype).
Because of this, most of the other functions would work the
same as
before (for example, property autocompletion would still
only show
properties, not arbitrary items).
A complication with this approach is that property
datatypes cannot
change in Wikibase. This design was picked since there is
no way to
convert existing data from one datatype to another in
general. So
changing the datatype would create problems by making a lot
of data
"invalid", and require special handling and special UI to
handle
this situation. With properties living in a separate
namespace, this
is not a real restriction: you can just create a new
property and
give it the same label (after naming the old one
differently, e.g.,
putting "DEPRECATED" in its name). Then you can migrate the
data in
some custom fashion. But if properties would be items, we
would have
a problem here: the item is already linked to many
Wikipedias and
other projects, and it might be used in LUA scripts,
queries, or
even external applications like Denny's Javascript translation
library. You cannot change item ids easily. Also, many
items would
not have a datatype, so the first one who (accidentally?)
is entered
will be fixed. So we would definitely need to rethink the
whole idea
of unchangeable datatypes.
My other important reason is conceptual. Properties are not
considered part of the (encyclopaedic) data but rather part
of the
schema that the community has picked to organise that data.
As in
your example, "emissivity" (Q899670) is a notion in physics as
described in a Wikipedia article. There are many things to
say about
this notion (for example, it has a history: somebody must have
defined this first -- although Wikipedia does not say it in
this
case). As in all cases, some statements might be disputed while
others are widely acknowledged to be "true".
For the property "emissivity" (P1295), the situation is quite
different. It was introduced as an element used to enter data,
similar to a row in a database table or an infobox template
in some
Wikipedia. It does probably closely relate to the actual
physical
notion Q899670, but it still is a different thing. For
example, it
was first introduced by User:Jakec, who is probably not the
person
who introduced the physical concept ;-) Anything that we
will say
about P1295 in the future refers to the property -- a
concept of our
own making, that is not described in any external source
(there are
no publications discussing P1295).
This is also the reason why properties are supposed to support
*claims* not *statements*. That is, they will have
property-value
pairs and qualifiers, but no references or ranks. Indeed,
anything
we say about properties has the status of a definition. If
we say
it, it's true. There is no other authority on Wikidata
properties.
You could of course still have items and properties "share"
a page
and somehow define which statements/claims refer to which
concept,
but this does not seem to make things easier for users.
These are, for me, the two main reasons why it makes sense
to keep
properties apart from items on a technical level. Besides
this, it
is also convenient to separate the 1000-something
properties from
the 15-million something items for reasons of maintenance.
Best regards,
Markus
On 28/05/14 09:25, David Cuenca wrote:
Since the very beginning I have kept myself busy with
properties,
thinking about which ones fit, which ones are missing
to better
describe
reality, how integrate into the ones that we have. The
thing is
that the
more I work with them, the less difference I see with
normal
items....
and if soon there will be statements allowed in
property pages, the
difference will blur even more.
I can understand that from the software development
point of view it
might make sense to have a clear difference. Or for the
community to get
a deeper understanding of the underlying concepts
represented by
words.
But semantically I see no difference between:
cement (Q45190) <emissivity (P1295)> 0.54
and
cement (Q45190) <emissivity (Q899670)> 0.54
Am I missing something here? Are properties really
needed or are we
adding unnecessary artificial constraints?
Cheers,
Micru
___________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
<mailto:Wikidata-l@lists.__wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>>
https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>
___________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
<mailto:Wikidata-l@lists.__wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>>
https://lists.wikimedia.org/____mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l>
<https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>>
--
Etiamsi omnes, ego non
_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
<mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
_________________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org <mailto:Wikidata-l@lists.wikimedia.org>
https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
<https://lists.wikimedia.org/mailman/listinfo/wikidata-l>
--
Etiamsi omnes, ego non
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l