Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

David Booth Mon, 31 Mar 2014 13:42:12 -0700

On 03/31/2014 11:59 AM, Peter F. Patel-Schneider wrote:


On 03/31/2014 08:31 AM, David Booth wrote:

On 03/30/2014 03:13 AM, Pat Hayes wrote:

[ , . . ]

> What follows from knowing that


ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might
include other stuff outside ccc. So, what *can* be inferred about the
relationship between x and ccc ? As far as I can see, nothing can be
inferred. If I am wrong, please enlighten me. But if I am right, what
possible utility is there in even making a schema:domainIncludes
assertion?

If "inference" is too strong, let me weaken my question: what
possible utility **in any way whatsoever** is provided by knowing
that schema:domainIncludes holds between ppp and ccc? What software
can do what with this, that it could not do as well without this?


I think I can answer this question quite easily, as I have seen it
come up before in discussions of logic.

Entailment produces statements that are known to be true, given a set
of facts and entailment rules.  And indeed, adding the fact that

  ppp schema:domainIncludes ccc .

to a set of facts produces no new entailments in that sense.


Is it then your contention that schema:domainIncludes does not add any
new entailments under the schema.org semantics?

Sorry, I misspoke. I did not mean to be taking a position on that, as Ihave not looked at that in any detail. The intent of my post was onlyto point out how -- even if there weren't any new entailments --schema:domainIncludes *does* still enable some useful inference forerror checking purposes.

But it *does* enable another kind of very useful machine-processable
inference that is useful in error checking, which I'll describe.

In error checking, it is sometimes useful to classify a set of
statements into three categories: Passed, Failed or Indeterminate.
Passed means that the statements are fine (within the checkable limits
anyway): sufficient information has been provided, and it is
internally consistent.  Failed means that there is something malformed
about them (according to the application's purpose). Indeterminate
means that the system does not have enough information to know whether
the statements are okay or not: further work might need to be
performed, such as manual examination or adding more information
(facts) to the system. Hence, it is *useful* to be able to quickly and
automatically establish that the statements fall into the Passed or
Failed category.

Note that this categorization typically relies on making a closed
world assumption (CWA), which is common for an application to make for
a particular purpose -- especially error checking.


I don't see that the CWA is particularly germane here, except that most
formalisms that do this sort of checking also utilize some sort of CWA.
There is notthing wrong with performing this sort of analysis in
formalisms that do not have any form of CWA.  What does cause problems
with this sort of analysis is the presence of non-trivial inference.


In this example, let us suppose that to pass, the object of every
predicate must be in the "Known Domain" of that predicate, where the
Known Domain is the union of all declared schema:domainIncludes
classes for that predicate.   (Note the CWA here.)

Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .

then without also knowing that "ppp schema:domainIncludes ccc", the
system may not be able to determine that these statements should be
considered Passed or Failed: the result may be Indeterminate.  But if
the system is also told that

  ppp schema:domainIncludes ccc .

then it can safely categorize these statements as Passed (within the
limits of this error checking).


Sure, but it can be very tricky to determine just what facts to consider
when making this determination, particularly with the upside-down nature
of schema:domainIncludes

My assumption in this example is that the application already has a setof assertions that it intends to work with, and it wishes to error checkthem.


Thus, although schema:domainIncludes does not enable any new
entailments under the open world assumption (OWA), it *does* enable
some useful error checking inference under the closed world assumption
(CWA), by enabling a shift from Indeterminate to Passed or Failed.

The CWA actually works against you here.  Given the following triples,

x ppp y .                       # Triple A
y rdf:type ddd .                # Triple B
ppp schema:domainIncludes ccc.  # Triple C

you are determining whether

y rdf:type ccc.                 # Triple E

is entailed, whether its negation is entailed, or neither.  The relevant
CWA would push these last two together, making it impossible to have a
three-way determination, which you want.

I don't think that's quite it. The error check that I described is notthe same as checking whether NOT(y rdf:type ccc) is entailed. (Such aconclusion could be entailed if there were an owl:disjointWithassertion, for example.) It is checking whether (y rdf:typeKnownDomain(ppp)). In other words, the CWA is not being made in testingwhether (y rdf:type ccc); rather it is being made in computingKnownDomain(ppp).

The net effect of this is that the CWA is being used to distinguishbetween cases that would all be considered "unknown" under the OWA.


David


If anyone is concerned that this use of the CWA violates the spirit of
RDF, which indeed is based on the OWA (for *very* good reason), please
bear in mind that almost every application makes the CWA at some
point, to do its job.

David


peter

Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

Reply via email to