W3C's HTML validation service seems to have no such problems. We've been using it to validate all the files on the unicode site regularly.
A validator *should* look between the > and < in order to catch invalid entity references, esp. invalu NCRs. For UTF-8, it would ideally also check that no ill-formed, and therefore illegal, sequences are part of the UTF-8. A./ At 07:16 AM 12/14/01 -0800, James Kass wrote: >Welé Negga wrote, > > > Does the Clean development team plan to make Concurrent > > Clean partially or fully Unicode compliant in their future > > releases, as this is crucial for those of us who use non-European > > writing systems, and more generally for those who develop > > truly global applications. > >It is crucial for everyone. > >Having an HTML validator, like Tidy.exe, which generates errors >or warnings every time it encounters a UTF-8 sequence is >unnerving. It's especially irritating when the validator >automatically converts each string making a single UTF-8 >character into two or three HTML named entities. > >If UTF-8 is needed in the HTML, and the HTML needs to be valid, >the user must make a back-up copy of the original HTML, run >the validator on the back-up, and then manually make corrections >to the source. This is quite cumbersome and should really be >unneccessary. > >HTML validators should only validate the HTML, that is the text >between the HTML brackets "<" and ">", and not affect the actual >text of the file. > >Best regards, > >James Kass. > >