On Dec 19, 2006, at 11:55, Bjoern Hoehrmann wrote:
You might want to read my notes on the subject. They do not propose
any
particular format, but list requirements and problems to be solved
by a
format of this kind, see <http://esw.w3.org/topic/MarkupValidator/
M12N>.
"The current validator supports multiple input sources, file upload,
textarea, and retrieval of remote resources. An observation is
naturally bound to the input retrieved through these sources (or
their metadata) and should thus be identified in the observation
instance."
I don't see why the source needs to be identified. Surely the client
invoking the checker knows what it sent as the input.
(Associating the URI of the entity with a source line and column is
subtly different from merely echoing things about the input that the
provider of the input already knows.)
"The descriptor should be extensible to allow for different location
addressing schemes"
Then consumers of the format would need to support different
addressing schemes.
"A related question is how the results would be presented in the
XHTML interface, it could be a hierarchy like"
"Well-formedness errors:"
"DTD-Validitiy errors:"
"Link Check"
Since off-the-shelf libraries don't usually categorize errors like
that, introducing such categorization as an afterthought could well
go into the territory of diminishing returns, because the cost of
introducing categorization would be great compared to the benefit.
For example, the SAX2 ErrorHandler interface doesn't guarantee that a
report of an "error" carries any data beyond stating that an error
occurred. In practice, an English-language message is available. Most
often also an approximate source location is available. Extracting
any more data than that generally requires hacking into the off-the-
shelf libraries and subverting the usual reporting mechanism.
"bla bla ... branding ... outreach ... community ... positive
statements ... terminology ..."
:-)
In the context of that document, the need for a common format came
from
the desire to enable multiple independent tools to combine the results
at low and high levels, for example, to combine multiple "microformat"
checkers with a general-purpose XHTML Validator.
If I were to integrate a microformat checker with my validation
service, I'd prefer to integrate them in-process. That is the
checkers would need to consume SAX2 ContentHandler events and report
to a SAX2 ErrorHandler. Of course, such an arrangement would require
the checkers to be written in Java.
The primary use case for the Web service format that I am considering
is allowing e.g. a blogging system to send a document off to a Web
service for checking so that the blogging system doesn't need to
contain an in-process conformance checker.
Also note that the ISO 19757-3 (Schematron) specification defines a
reporting format.
The way I have seen Schematron used (which is also how I use it
myself) makes whether an error check is implemented as a failing
assertion or as a succeeding report an implementation detail which
shouldn't be exposed to end users or even software observers outside
the Schematron engine. I think I'll patch my copy of Jing/oNVDL at
some point to hide whether a message was generated by a failed
assertion or a report.
Moreover, of late, I have started to consider the Schematron of the
HTML5 conformance checker a mere rapid prototype of a hand-crafted
more CPU-efficient and more memory-efficient exclusion and
referential integrity checker.
Thank you for the pointers.
--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/