Disclaimer: Still not a WG response.
Changed W3C list to www-archive, because this reply isn't feedback
about HTML 5.
On Feb 17, 2008, at 23:08, Frank Ellermann wrote:
Henri Sivonen wrote:
Validator.nu checks the combination of the protocol
entity body and the Content-Type header. Pretending
that Content-Type didn't matter wouldn't make sense
when it does make a difference in terms of processing
in a browser.
I checked if the W3C validator servers still claim that
application/xml-external-parsed-entity is chemical/x-pdb
This was either fixed, or it is an intermittent problem,
therefore I can continue my I18N tests today.
It was fixed.
XHTML 1 like HTML 4 wants URIs in links.
HTML 4.01 already defined IRI-compatible processing for the path and
query parts, so now that there are actual IRIs, making Validator.nu
complain about them doesn't seem particularly productive.
For experiments with
IRIs I created a homebrewn XHTML 1 i18n document type.
Actually the same syntax renaming URI to IRI everywhere,
updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments,
That's a pointless exercise, because neither browsers nor validators
ascribe meaning to DTD comments or production identifiers.
To get some results related to the *content* of my test
files I have to set three options explicitly:
* Be "lax" about HTTP content - whatever that is, XHTML 1
does not really say "anything goes", but validator.nu
apparently considers obscure "advocacy" pages instead
of the official XHTML 1 specification as "normative".
Validator.nu treats HTML 5 as normative and media type-based
dispatching in browsers as congruent de facto guidance.
With those three explicitly set options it could finally
report that my test page is "valid" XHTML 1 transitional.
But it's *not*, it uses real IRIs in places where only URIs
are allowed, a major security flaw in DTD based validators:
<http://omniplex.blogspot.com/2007/11/broken-validators.html>
I've fixed the schema preset labeling to say "+ IRI".
| Warning: XML processors are required to support the UTF-8
| and UTF-16 character encodings. The encoding was KOI8-R
| instead, which is an incompatibility risk.
Untested, I hope US-ASCII wouldn't trigger this warning, as
a mobile-ok prototype did some months ago (and maybe still
does).
US-ASCII and ISO-8859-1 (their preferred IANA names only) don't
trigger that warning, because I don't have evidence of XML processors
that didn't support those two in addition to the required encodings.
Validator.nu accepts U-labels (UTF-8) in system identifiers,
W3C validator doesn't, and I also think they aren't allowed
in XML 1.0 (all editions). Martin suggested they are okay,
see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>.
Validator.nu URIfies system ids using the Jena IRI library set to the
XML system id mode.
Considering that the XML spec clearly sought to allow IRIs ahead of
the IRI spec, would it be actually helpful to change this even if a
pedantic reading of specs suggested that the host part should be in
Punycode?
Validator.nu rejects percent encoded UTF-8 labels in system
identifiers, like the W3C validator. I think that is okay,
*unless* you believe in a non-DNS STD 66 <reg-name>, where
it might be syntactically okay. Hard to decide, potentially
a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5280>.
I don't believe in non-DNS host names.
[back to the general "HTML5 considered hostile to users"]
What are you trying to achieve?
As mentioned about ten times in this thread I typically try
to validate content, as author of the relevant document, or
in a position to edit (in)valid documents.
But why do you want to validate content only when the Content-Type
matters on the Web and you seem to be hostile to the idea of fixing
how your documents are served? What good does it do to serve XHTML
with a custom DTD when real browsers don't read the DTD and don't even
parse the document as XML?
The complete number of HTTP servers under my control at this
second (counting servers where I can edit dot-files used as
configuration files by a popular server) is *zero*. That is
a perfectly normal scenario for many authors and editors.
These days, it is also a relatively easily fixable scenario. In
particular, if you want to be in the business of creating test suites,
getting hosting where you can tweak the Content-Type is generally a
good way to start.
Of course I'm not happy if files are served as chemical/x-pdb
or similar crap, but it is outside my sphere of influence,
Fortunately, it turned out that is was within my sphere of influence:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5446
and not what I'm interested in when I want to know what *I* did to
make the overall picture worse *within* documents edited by me.
A validator can't know what parts you can edit and what parts you
can't. However, if you care about practical stuff, you shouldn't even
enable external entity loading, since browsers don't load external
entities from the network. (That's why the option isn't the default on
Validator.nu.)
Are you trying to check that your Web content doesn't have
obvious technical problems?
Normally, yes. Of course we are discussing mainly my validator
torture test pages, intentionally *unnormal* pages.
Like I said above, I suggest getting better hosting if you want to
host test suites.
Or are you just trying to game a tool to say that your page is
valid
Rarely. I use image-links hidden by span within pre on one page,
at some point in time validators will tell me that this is a hack,
no matter if it works with all browsers I've ever tested. Sanity
check with validator.nu: Your tool says that this is an error.
Could you provide a URL to a demo page?
Why are you validating pages?
To find bugs.
As far as bugs that affect practical Web usage go, all "bugs" related
to loading external entities are irrelevant...
Thank you for the feedback.
--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/