[WSG] Validating and validators

2006-03-16 Thread Kat

Gday,

For a good while now I have been using A Real Validator to validate my 
html offline.


Recently, I went back and did a quick search to see what sort of 
validators are around, and I came across a couple of interesting things:


1. What are your opinions of SGML-parsers vs linters? Do both have their 
place? Do they have different roles?


2. Is Validome an SGML parser or linter?

3. How accurate do you believe is Validome's statement of errors?
http://www.validome.org/lang/en/errors/ALL

4. What is the most successful way in ensuring correct and valid html 
and or xhtml (considering different validators have different errors)?


Kat

**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] Validating and validators

2006-03-16 Thread Lachlan Hunt

Kat wrote:
For a good while now I have been using A Real Validator to validate my 
html offline.


That's good.

1. What are your opinions of SGML-parsers vs linters? Do both have their 
place? Do they have different roles?


SGML parsers will tell you exactly what's wrong with your document 
according to the formal definition in the SGML declaration and the 
document's DTD and is the best choice for HTML documents.  SGML based 
validators for XML documents may have limitations (including any based 
on OpenSP, such as the W3C/WDG validators and A Real Validator).  Most, 
if not all, of these limitations relate to well-formedness errors which 
will be picked up by a browser when you use an XML MIME type anyway.


In general, lints are mostly quite useless for validation and they often 
lie.  Though some, like HTML tidy (which is one of the most useless 
tools for validation, IMHO), is still reasonably good at cleaning up 
really messy documents so that they can at least be worked with.



2. Is Validome an SGML parser or linter?


It's a lint.


3. How accurate do you believe is Validome's statement of errors?
http://www.validome.org/lang/en/errors/ALL


XML Declaration:
  All the errors not caught by the W3C validator are limitations with 
its XML support, but all of which are well formedness errors that will 
be caught by any decent browser when you use the correct MIME type.


Error in the Document Type Declaration:
All of the following are not caught by true SGML based validators, but 
that's because none of them are actually errors.  Any validator that 
chooses to raise these issues should call them warnings, not errors.


* System-ID missing (at PUBLIC) in HTML-Document
* Missing White Space between Public-ID and System-ID
* Upper and lower case at HTML-Documents
* Document Type Declaration in commentary area
* Unallowed overwriting of parameter entities
* HTML-Document with user-defined DTD
* HTML-Document with unknown Public-ID and user-defined DTD

I don't have time to go through the rest right now, I may do so later.

4. What is the most successful way in ensuring correct and valid html 
and or xhtml (considering different validators have different errors)?


For HTML:
  Use a real SGML based validator.  In general, I prefer the W3C 
validator, but the WDG validator's additional warnings that are not 
emitted by the W3C validator can be useful for making documents more 
compatible with real browsers.  e.g. it will give warnings about 
SHORTTAG NET usage (often just a result of XML syntax in an HTML document).


For XHTML:
  Make sure you develop and test the page using the correct MIME type. 
 That will catch any well-formedness errors immediately (including 
those not caught by the SGML validators).  Generally speaking, a 
validator that uses a real XML parser would be best, though the W3C/WDG 
validators are still very good, especially if you've already ensured 
against well-formedness that they won't catch.  The W3C/WDG validators 
will also give some useful warnings that a real XML parser won't, such 
as reference to a non-SGML character.  That's very useful for 
detecting common mistakes like #146; instead of #x2019;.


--
Lachlan Hunt
http://lachy.id.au/
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**