W3C's HTML validation service seems to have no such problems.
We've been using it to validate all the files on the unicode
site regularly.

A validator *should* look between the > and < in order to
catch invalid entity references, esp. invalu NCRs.

For UTF-8, it would ideally also check that no ill-formed,
and therefore illegal, sequences are part of the UTF-8.

A./

At 07:16 AM 12/14/01 -0800, James Kass wrote:

>Welé Negga wrote,
>
> > Does the Clean development team plan to make Concurrent
> > Clean partially or fully Unicode compliant in their future
> > releases, as this is crucial for those of us who use non-European
> > writing systems, and more generally for those who develop
> > truly global applications.
>
>It is crucial for everyone.
>
>Having an HTML validator, like Tidy.exe, which generates errors
>or warnings every time it encounters a UTF-8 sequence is
>unnerving.  It's especially irritating when the validator
>automatically converts each string making a single UTF-8
>character into two or three HTML named entities.
>
>If UTF-8 is needed in the HTML, and the HTML needs to be valid,
>the user must make a back-up copy of the original HTML, run
>the validator on the back-up, and then manually make corrections
>to the source.  This is quite cumbersome and should really be
>unneccessary.
>
>HTML validators should only validate the HTML, that is the text
>between the HTML brackets "<" and ">", and not affect the actual
>text of the file.
>
>Best regards,
>
>James Kass.
>
>


Reply via email to