Glenn,

On 10/12/2011 08:49 AM, glenn mcdonald wrote:
I agree with this entirely, and it's why I keep insisting that for most purposes datasets should be expressed using local identifiers, with all external linkages called out explicitly and/or externally. owl:sameAs and the use of other people's identifiers for your own nodes are equally dangerous. If I'm asserting that Brussels is the capital of Belgium, I'm saying that my notion of Brussels is my notion of "capital" of my notion of Belgium. I am the authority for that assertion. Saying that my notion of Brussels, "capital" or Belgium correspond with anybody else's notion of anything are separate assertions, for which I do not have the same authority.


I am not sure you are saying the same thing as Hugh.

Who else would be able to make assertions about "your" notion of Brussels vis-a-vis "some other notion of Brussels" with any more authority than your own?

Granting your may want to package those separately, which I understood to be Hugh's point. To favor software that has difficulty with outliers or contradictory information.

For that matter, the proper interpretation of "correspond" depends on the purpose: for some things, treating "correspond" as owl:sameAs may be exactly right, and for some it might be utterly unacceptable. And it's much easier to map a "corresponds" property to owl:sameAs if you want to than to rewrite an entire dataset to undo the misapplication of IDs or owl:sameAs.


Ignoring owl:sameAs statements isn't an option?

Think global, assert local.

Hmmm, so how do diverse data sets get combined? If there is no one who has the authority to make assertions about subjects outside their own data sets?

Hope you are having a great day!

Patrick


glenn


On Wed, Oct 12, 2011 at 7:55 AM, Hugh Glaser <[email protected] <mailto:[email protected]>> wrote:


    Hi.

    I have argued for a long time that the linkage data (in particular
    owl:sameAs and similar links) should not usually be mixed with the
    knowledge being published.

    Thus, for example as I discussed with Evan for the NYTimes site a
    while ago, it is not a good thing to put the owl:sameAs links
    (which were produced by a relatively unskilled individual over a
    short period of time) at the same status as the other data, which
    has been curated over decades by expert reporters.

    These sameAs links have potentially very different trust,
     provenance, licence, and possibly other non-functional attributes
    from the substantive data.
    Clearly they have different trust and provenance, but licence may
    well be different, as the NYT may want people to take the triples
    away to bring traffic to their site, while keeping the other
    triples under more restricted licence.

    Which brings me to an example of where things have recently gone
    badly wrong.
    I have reported a bug to the dbpedia team wherein the URIs for
    countries have become deeply intertwingled.
    Example queries are at the end of this message - they have to
    explicitly do the owl:sameAs because the store does not do
    owl:sameAs inference, but the outcome is that I can validly infer
    answers such as "Maseru is the capital of Belgium".

    Of course, mistakes happen, so I am not having a specific go at
    dbpedia, which I still think is wonderful.

    But the outcome is that I get very bad data from dbpedia.org
    <http://dbpedia.org> unexpectedly, which means I (and presumably
    anyone else) can't reliably use dbpedia.org <http://dbpedia.org>
    at all (because I use an inference engine when I cache the data).
    Had the dbpedia.org <http://dbpedia.org> site simply stuck to the
    behaviour I was sort of expecting of publishing data from
    wikipedia (possibly publishing the linkage data elsewhere) I would
    have been in a better position.

    One of the issues here is to realise when we are actually adding
    knowledge to a triplication process.
    It is clear when things like owl:sameAs are added that knowledge
    is being added.
    However, people probably consider it less clear if URIs from
    dbpedia or elsewhere are directly used that they are adding their
    own knowledge.
    In a similar way, such use introduces knowledge which may have
    very different trust and provenance from the data being triplified.

    Is this a good way to do things?

    I would say not.
    I have used a wide variety of Linked Data sources, and have found
    problems with almost every one of them (possibly every significant
    one).
    The problems frequently relate to the extra knowledge that the
    triplication process has introduced.
    If only I could be given the data without, then I would not have
    to reject the dataset.

    Thanks for reading this far.
    Best
    Hugh

    Query:
    SELECT DISTINCT ?capital WHERE {
     ?s owl:sameAs <http://dbpedia.org/resource/Belgium> .
     ?s owl:sameAs ?country .
     ?country <http://dbpedia.org/ontology/capital> ?capital .
    }

    As a URI:
    
http://dbpedia.org/snorql/?query=SELECT+DISTINCT+%3Fcapital+WHERE+%7B%0D%0A+%3Fs+owl%3AsameAs+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FBelgium%3E+.%0D%0A+%3Fs+owl%3AsameAs+%3Fcountry+.%0D%0A+%3Fcountry+%3Chttp%3A%2F%2Fdbpedia.org%2Fontology%2Fcapital%3E+%3Fcapital+.%0D%0A%7D%0D%0A

    Output:
    capital
    http://dbpedia.org/resource/City_of_Brussels
    http://dbpedia.org/resource/Maseru


    --
    Hugh Glaser,
                 Web and Internet Science
                 Electronics and Computer Science,
                 University of Southampton,
                 Southampton SO17 1BJ
    Work: +44 23 8059 3670 <tel:%2B44%2023%208059%203670>, Fax: +44 23
    8059 3045 <tel:%2B44%2023%208059%203045>
    Mobile: +44 75 9533 4155 <tel:%2B44%2075%209533%204155> , Home:
    +44 23 8061 5652 <tel:%2B44%2023%208061%205652>
    http://www.ecs.soton.ac.uk/~hg/ <http://www.ecs.soton.ac.uk/%7Ehg/>





--
Patrick Durusau
[email protected]
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)

Another Word For It (blog): http://tm.durusau.net
Homepage: http://www.durusau.net
Twitter: patrickDurusau

Reply via email to