Richard Cyganiak wrote:
On 9 Jul 2008, at 00:11, Bijan Parsia wrote:
[big snip]
Complaining that the Big Nasty People Who Know What They're Talking
About are raining on your sameAs parade isn't constructive.
Ah Bijan. How about *you* grow up, flameboy?
(Please soften your language, both of you. Consider picking up the phone
instead.)
You keep asserting that There Are Technical Problems With Using sameAs.
It would help your argument if you told us what those technical problems
actually *are*. I heard you say that using owl:sameAs could bite us in
the butt. Could you be more specific?
The core idea is quite simple, and relates to the notion of when two
(rdf/owl) documents are describing (typically amongst other things) the
single same entity, ...the same thing. In cases where it is true to say
they describe the same entity, the term 'owl:sameAs' is one handy way to
express that situation. In cases where the two documents describe
different entities, it is not true to say that owl:sameAs holds between
them. This is all irrespective of which document (if any) the owl:sameAs
claims are made in, and purely cast in terms of whether the claim is
true. And the main thing to remember about OWL here is that if the
owl:sameAs claim is true, and we believe both of the docs, all
information about that entity written in both documents gets pooled.
Many people in this forum, including me, do not have a background in
formal logics. Without that background, it is hard to distinguish proper
uses of owl:sameAs from improper uses of owl:sameAs.
This is true, regarding the list. There are people from a great variety
of backgrounds around here. And on a good day, that is one of our strengths.
A side note: The reason why I advocate the use of owl:sameAs is not that
it's the *right* solution. But it's *the only solution that was
available*. The alternative would have been to argue for a year or two
instead of linking up our datasets. Not compelling. That being said, I'm
very interested in hearing your take on when I should use owl:sameAs and
when not.
One metric here might simply be: what % of owl:sameAs claims in the LOD
scene are false claims. However, that isn't itself always a bad thing.
Sometimes publishing false information online has value - for example,
historical data. Life is a lot easier though if at least the identity
reasoning we do is based on reliable information. For this reason,
publishing false identity claims can be a lot more destructive than
publishing other kinds of falsehood. The LiveJournal RDF/FOAF dataset
for example might be full of 10s of 1000s of fake birthdate properties.
We kinda expect that. And we should also expect to see a rise in spam
blogs making false identity claims too about their owners. Dealing with
the latter is a bigger pain though. For datasets that come from
relatively trusted sources, it is a big win if we can believe the
identity-related claims they make.
If the best data / tools you have suggest that two docs/datasets are
describing the selfsame entity, using owl:sameAs seems fine, even if you
have a secret hunch you're only perhaps 95% confident of the data
quality or tool reliability. If the best information you have instead is
telling you "these two documents seem to be talking about more or less
the same notion", then owl:sameAs probably isn't for you: it doesn't
communicate what you know. Which of these situations you're in might be
something of a judgement call, but it should be a judgement call
grounded in clarity about what a use of owl:sameAs is claiming.
I doubt we can get very far with this in the absence of examples. Would
anyone like to collect up a dozen various owl:sameAs claims published
explicitly in the Web that might be considered questionable? (for now
let's set aside cases where owl:sameAs is implied by other constructs).
cheers,
Dan
--
http://danbri.org/