On 08.01.2015 20:37, Thad Guidry wrote:
...


Right, Freebase would not stick a Property called "Country" right on an
instance of a Music Band.  We would put Country under the Musical Group
type, and give it a better definition like "The nation or territory that
this item originated from".  Freebase's Properties always live under a
Freebase Type, like "Musical Group".  Which is why on Wikidata, even
seeing P17 on the U2 topic page makes me wonder what kind of schema
Wikidata is trying to pull off.  But it appears that someone did not
really read the description page of P17, like I just did, then they
would see it just is not allowed like that, but instead should have used
P27, but then you can't have a date of birth for a Musical Group (band),
which voids using even P27 on an instance of band.

I understand, there are many holes in Wikidata's schema currently.  I am
one of several Freebase experts coming over that can help Wikidata
identify those problematic Schema. :-)

Dear Thad,

It is important to realize that there is a crucial difference between the way schema information is organised in Wikdiata and in Freebase. Freebase uses a type-based schema, where types (roughly our "classes") are the main organisation unit, with properties being subordinate (like attributes of a certain type of object in programming). In contrast, Wikidata uses a property-based schema where properties are the first-class citizens and types are only one particular kind of value one could give to certain properties (like, e.g., in the W3C Resource Description Framework).

Both of these approaches have been used in many places, and it would not be conclusive to have a discussion about which one is better on principled grounds. To be productive together, however, it is important to realise that we are translating between two different worlds here. It is not about "fixing" one world so it fits the viewpoint of the other, but to understand the system that is in place and to adopt to it.

Independent of these things, there are of course plenty of ways to improve the data and schema. The case you focussed on is quite typical for the general question whether classes/properties should be broader or narrower in their scope. Too broad definitions lead to data of unclear meaning and little information value. Too narrow definitions lead to incomparable "local" data formats that make it hard or impossible to combine data on relatively similar things since they use different schemas. There is always a tension between the two, and I am sure we have cases where we err in either direction.

Note that this problem is not caused by a property-based view -- it's just a question of modelling that has to be discussed in each case. As for the case of "country" values for bands, it seems to me completely natural to use it without any further qualification. It is rather broad, I admit, but I understand the meaning just like I understand the sentence "U2 are an Irish rock band from Dublin" [1]. I don't think we should change this to "U2 are a rock band that has originated in the country of Ireland" or "U2 are a rock band whose members are of Irish nationality" to clarify this. It is very common to associate nationalities with bands and "country" seems to be a sensible name for the respective property. If you think that there could be some misunderstanding then it might be that we need to use properties with narrower meanings, but it is still useful to have a simple broad way of displaying bands by country, etc. And as long as it is clear to most people which relationships in "real life" this tries to capture, I don't think any action is needed.

Cheers,

Markus

[1] https://en.wikipedia.org/wiki/U2




        2. How does Wikidata want to handle locking down Property
        descriptions
        (Freebase uses Permissions and Owners), where the complete
        meaning of
        something being changed might cause severe wrongful polluted data ?


    There is no such thing in wikis.
    http://c2.com/cgi/wiki?__WikiDesignPrinciples
    <http://c2.com/cgi/wiki?WikiDesignPrinciples>
    https://meta.wikimedia.org/__wiki/The_wiki_way
    <https://meta.wikimedia.org/wiki/The_wiki_way>


But Wikidata is not a "wiki" in the true sense, or should not be
purported as one.... Because it is not Schema-less, but in fact,
prescribes to a publicly editable and agreed upon Schema model.

One thing I did notice is that the Wikidata Schema model is actually
composed of both agreement on the 2 tabs of
https://www.wikidata.org/wiki/Property_talk:P17  both the Property tab,
AND the Discussions tab....combined...give the effective model of the
Property...whereas in Freebase, we would just have the Property, where
all rules and definitions about it are stored (Discussions about a
Property were stored on our wiki and also our mailing list).  I enjoy
the Wikidata way a bit more compared to Freebase, the benefit being a
primary place to see the defines of the Property as well as the
Discussion and questions about it in the past.

    The errors are corrected after the fact; the central control system
    is not made of permissions, but of checks like the constraint
    violations bots mentioned above. What other pollutions of the data
    you have in mind?


And that is my worry.  That the Schema model is publicly editable at any
time.  And constraint violations are only effective against a "Well
Defined Property".  But what if I do not Well Define that property, or
worst, I completely change the meaning of that Property.  Imagine if I
suddenly change the meaning of one of your MySQL table columns... like,
PERSON suddenly becomes FURNITURE.  That can happen with Wikidata's
publicly editable Schema model....if someone maliciously changes the
description of that P17 Country to something very generic like "a
state".... oh really ?  What kind of state ?  Nations only ? Or
territories considered as an organized political community under one
government.? or both ?  it appears that P17's Discussion clarifies this
a bit, and defines it a bit more narrowly and would not allow just any
territory with a political community.

We have the same problem in Freebase, where if by public agreement, we
change the meaning of a Property so much that it might cause erroneous
data statements, then we deprecate that Property and create a new one,
splitting off the various statements into their proper form and letting
the Community know, and also performing the data tasks to subscribe the
old data to the new Schema.

The pollution of data would happen if by agreement P17's Discussion page
drastically changed the intended meaning of it, then all the data that
used P17 would need to be cleaned up.

How does Wikidata intend to deal with those kinds of changes to Property
meanings in the future ? and the data cleanup involved ?



_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l



_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to