Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

Peter F. Patel-Schneider Mon, 31 Mar 2014 09:02:10 -0700


On 03/31/2014 08:31 AM, David Booth wrote:

On 03/30/2014 03:13 AM, Pat Hayes wrote:

[ , . . ]

> What follows from knowing that


ppp schema:domainIncludes ccc . ?

Suppose you know this and you also know that

x ppp y .

Can you infer x rdf:type ccc? I presume not, since the domain might
include other stuff outside ccc. So, what *can* be inferred about the
relationship between x and ccc ? As far as I can see, nothing can be
inferred. If I am wrong, please enlighten me. But if I am right, what
possible utility is there in even making a schema:domainIncludes
assertion?

If "inference" is too strong, let me weaken my question: what
possible utility **in any way whatsoever** is provided by knowing
that schema:domainIncludes holds between ppp and ccc? What software
can do what with this, that it could not do as well without this?

I think I can answer this question quite easily, as I have seen it come upbefore in discussions of logic.

Entailment produces statements that are known to be true, given a set offacts and entailment rules. And indeed, adding the fact that


  ppp schema:domainIncludes ccc .

to a set of facts produces no new entailments in that sense.

Is it then your contention that schema:domainIncludes does not add any newentailments under the schema.org semantics?

But it *does* enable another kind of very useful machine-processableinference that is useful in error checking, which I'll describe.
In error checking, it is sometimes useful to classify a set of statementsinto three categories: Passed, Failed or Indeterminate. Passed means thatthe statements are fine (within the checkable limits anyway): sufficientinformation has been provided, and it is internally consistent. Failedmeans that there is something malformed about them (according to theapplication's purpose). Indeterminate means that the system does not haveenough information to know whether the statements are okay or not: furtherwork might need to be performed, such as manual examination or adding moreinformation (facts) to the system. Hence, it is *useful* to be able toquickly and automatically establish that the statements fall into the Passedor Failed category.
Note that this categorization typically relies on making a closed worldassumption (CWA), which is common for an application to make for aparticular purpose -- especially error checking.

I don't see that the CWA is particularly germane here, except that mostformalisms that do this sort of checking also utilize some sort of CWA.There is notthing wrong with performing this sort of analysis in formalismsthat do not have any form of CWA. What does cause problems with this sort ofanalysis is the presence of non-trivial inference.

In this example, let us suppose that to pass, the object of every predicatemust be in the "Known Domain" of that predicate, where the Known Domain isthe union of all declared schema:domainIncludes classes for thatpredicate. (Note the CWA here.)
Given this error checking objective, if a system is given the facts:

  x ppp y .
  y a ccc .
then without also knowing that "ppp schema:domainIncludes ccc", the systemmay not be able to determine that these statements should be consideredPassed or Failed: the result may be Indeterminate. But if the system isalso told that
  ppp schema:domainIncludes ccc .
then it can safely categorize these statements as Passed (within the limitsof this error checking).

Sure, but it can be very tricky to determine just what facts to consider whenmaking this determination, particularly with the upside-down nature ofschema:domainIncludes

Thus, although schema:domainIncludes does not enable any new entailmentsunder the open world assumption (OWA), it *does* enable some useful errorchecking inference under the closed world assumption (CWA), by enabling ashift from Indeterminate to Passed or Failed.

The CWA actually works against you here.  Given the following triples,

x ppp y .
y rdf:type ddd .
ppp schema:domainIncludes ccc.

you are determining whether

y rdf:type ccc.

is entailed, whether its negation is entailed, or neither. The relevant CWAwould push these last two together, making it impossible to have a three-waydetermination, which you want.

If anyone is concerned that this use of the CWA violates the spirit of RDF,which indeed is based on the OWA (for *very* good reason), please bear inmind that almost every application makes the CWA at some point, to do its job.
David


peter

Re: Inference for error checking [was Re: How to avoid that collections "break" relationships]

Reply via email to