"Tab Atkins Jr." <jackalm...@gmail.com>, 2012-08-12 15:43 -0700:
> What Dimitri said, but to address your comment directly, DTD-based > validation is long-dead, at least when applied to HTML. A DTD can't > capture the validity requirements that the HTML spec already imposes, > so it's irrelevant if it also can't validate a document containing > custom elements. The current validator used by the W3C is a > combination of (iirc) constrains expressed in Schematron and custom > Java code. The core of the backend for the W3C Nu Markup Validator (http://validator.w3.org/nu/) and validator.nu is James Clark's Jing, a Relax NG implementation. The backend doesn't actually use Schematron, for performance reasons. Instead it has some Java code to perform the equivalent the of assertions-based checking that Schematron provides but that can't be done with grammar-based checking alone (whether with Relax NG or anything else). No grammar-based schema language is capable of expressing all the constraints in HTML spec. Things like checking the data types (microsyntaxes) of attribute values requires custom code -- especially if you want to report useful messages for errors (something regexp-based checking it totally useless for). Also, more to the point here, things like the fact that arbitrary attribute names prefixed with "data-" are valid -- grammar-based checkers can't handle that at all. So the validator.nu backend has some custom code that Henri wrote that drops those data-* attributes -- basically, filters them out -- before the Jing part of the toolchain even sees them. --Mike -- Michael[tm] Smith http://people.w3.org/mike