Dear Thad,
The second part of your email has good points in it, too. As you say,
one must allow for adjustments in the intended meaning of a property in
real life, and adjusting too much could be dangerous. The method you
suggest (creating a new property and deprecating the old one, rather
than modifying the meaning of the old one) has been used in Wikidata as
well. Another thing that is very common in Wikidata are
community-controlled mass edits: users offer to write conversion tools
and the community decides how to use them. This can help to convert
large amounts of data to a new format or to detect and cross-check
errors. The constraint mechanism you see today is also a
community-created way of avoiding unintended uses (be they caused by
changed definitions or by other causes).
Finally, some basic things that you probably know already:
* Property datatypes in Wikidata cannot be changed after creation
(ensuring that the data always remains structurally valid for this
type). Technically speaking, this is the only "schema" we have (you
mentioned SQL: things are similar there; the database schema does not
include a definition of what you mean by the country column in the band
table; at best it tells you that the country should be a key in the
countries table).
* The experience with Wikipedia has led to many mechanisms for
counteracting spam and vandalism. Patrolling, (semi)protection of pages,
spam fighting robots and scripts, etc. can similarly be used on Wikidata
to fight against deliberate attacks on high-visibility pages such a
properties. In a sense, structured data is making it much easier to
detect problems with automated tools. This should be effective against
most deliberate attempts to cause trouble by changing property labels
etc. (but we will need to do more there I guess).
Cheers,
Markus
On 08.01.2015 20:37, Thad Guidry wrote:
...
And that is my worry. That the Schema model is publicly editable at any
time.
...
We have the same problem in Freebase, where if by public agreement, we
change the meaning of a Property so much that it might cause erroneous
data statements, then we deprecate that Property and create a new one,
splitting off the various statements into their proper form and letting
the Community know, and also performing the data tasks to subscribe the
old data to the new Schema.
The pollution of data would happen if by agreement P17's Discussion page
drastically changed the intended meaning of it, then all the data that
used P17 would need to be cleaned up.
How does Wikidata intend to deal with those kinds of changes to Property
meanings in the future ? and the data cleanup involved ?
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l