Re: validator.nu

Henri Sivonen Mon, 18 Feb 2008 01:12:01 -0800


Disclaimer: Still not a WG response.

Changed W3C list to www-archive, because this reply isn't feedbackabout HTML 5.


On Feb 17, 2008, at 23:08, Frank Ellermann wrote:

Henri Sivonen wrote:

Validator.nu checks the combination of the protocol
entity body and the Content-Type header. Pretending
that Content-Type didn't matter wouldn't make sense
when it does make a difference in terms of processing
in a browser.


I checked if the W3C validator servers still claim that
application/xml-external-parsed-entity is chemical/x-pdb

This was either fixed, or it is an intermittent problem,
therefore I can continue my I18N tests today.


It was fixed.

XHTML 1 like HTML 4 wants URIs in links.

HTML 4.01 already defined IRI-compatible processing for the path andquery parts, so now that there are actual IRIs, making Validator.nucomplain about them doesn't seem particularly productive.

For experiments with
IRIs I created a homebrewn XHTML 1 i18n document type.

Actually the same syntax renaming URI to IRI everywhere,
updating RFC 2396 + 3066 to 3987 + 4646 in DTD comments,

That's a pointless exercise, because neither browsers nor validatorsascribe meaning to DTD comments or production identifiers.

To get some results related to the *content* of my test
files I have to set three options explicitly:

* Be "lax" about HTTP content - whatever that is, XHTML 1
 does not really say "anything goes", but validator.nu
 apparently considers obscure "advocacy" pages instead
 of the official XHTML 1 specification as "normative".

Validator.nu treats HTML 5 as normative and media type-baseddispatching in browsers as congruent de facto guidance.

With those three explicitly set options it could finally
report that my test page is "valid" XHTML 1 transitional.

But it's *not*, it uses real IRIs in places where only URIs
are allowed, a major security flaw in DTD based validators:
<http://omniplex.blogspot.com/2007/11/broken-validators.html>


I've fixed the schema preset labeling to say "+ IRI".

| Warning: XML processors are required to support the UTF-8
| and UTF-16 character encodings. The encoding was KOI8-R
| instead, which is an incompatibility risk.

Untested, I hope US-ASCII wouldn't trigger this warning, as
a mobile-ok prototype did some months ago (and maybe still
does).

US-ASCII and ISO-8859-1 (their preferred IANA names only) don'ttrigger that warning, because I don't have evidence of XML processorsthat didn't support those two in addition to the required encodings.

Validator.nu accepts U-labels (UTF-8) in system identifiers,
W3C validator doesn't, and I also think they aren't allowed
in XML 1.0 (all editions).  Martin suggested they are okay,
see <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5279>.

Validator.nu URIfies system ids using the Jena IRI library set to theXML system id mode.

Considering that the XML spec clearly sought to allow IRIs ahead ofthe IRI spec, would it be actually helpful to change this even if apedantic reading of specs suggested that the host part should be inPunycode?

Validator.nu rejects percent encoded UTF-8 labels in system
identifiers, like the W3C validator.  I think that is okay,
*unless* you believe in a non-DNS STD 66 <reg-name>, where
it might be syntactically okay.  Hard to decide, potentially
a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=5280>.


I don't believe in non-DNS host names.

[back to the general "HTML5 considered hostile to users"]

What are you trying to achieve?


As mentioned about ten times in this thread I typically try
to validate content, as author of the relevant document, or
in a position to edit (in)valid documents.

But why do you want to validate content only when the Content-Typematters on the Web and you seem to be hostile to the idea of fixinghow your documents are served? What good does it do to serve XHTMLwith a custom DTD when real browsers don't read the DTD and don't evenparse the document as XML?

The complete number of HTTP servers under my control at this
second (counting servers where I can edit dot-files used as
configuration files by a popular server) is *zero*.  That is
a perfectly normal scenario for many authors and editors.

These days, it is also a relatively easily fixable scenario. Inparticular, if you want to be in the business of creating test suites,getting hosting where you can tweak the Content-Type is generally agood way to start.

Of course I'm not happy if files are served as chemical/x-pdb
or similar crap, but it is outside my sphere of influence,


Fortunately, it turned out that is was within my sphere of influence:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=5446

and not what I'm interested in when I want to know what *I* did to
make the overall picture worse *within* documents edited by me.

A validator can't know what parts you can edit and what parts youcan't. However, if you care about practical stuff, you shouldn't evenenable external entity loading, since browsers don't load externalentities from the network. (That's why the option isn't the default onValidator.nu.)

Are you trying to check that your Web content doesn't have
obvious technical problems?


Normally, yes.  Of course we are discussing mainly my validator
torture test pages, intentionally *unnormal* pages.

Like I said above, I suggest getting better hosting if you want tohost test suites.

Or are you just trying to game a tool to say that your page is
valid


Rarely.  I use image-links hidden by span within pre on one page,
at some point in time validators will tell me that this is a hack,
no matter if it works with all browsers I've ever tested.  Sanity
check with validator.nu:  Your tool says that this is an error.


Could you provide a URL to a demo page?

Why are you validating pages?


To find bugs.

As far as bugs that affect practical Web usage go, all "bugs" relatedto loading external entities are irrelevant...


Thank you for the feedback.

--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: validator.nu

Reply via email to