[EMAIL PROTECTED] wrote: > Briefly, it's my opinion that applications which claim to support > and comply with Unicode should not 'step on' Unicode text. Any > loopholes in the 'letter of the law' which allow applications to > mung or reject Unicode text should be plugged.
If this "pluging" request must be done, it should be also the case for HTML and XML. For now, combining characters can be encoded directly just after a quote character (single or double) used to mark the beginning of an attribute value, or just after a tag-closing ">". HTML and XML parsers will parse these quotes or superior signs by ignoring the combining sequence, creating defective sequences, but this is a problem. My opinion is that HTML and XML parsers should not take the quote and superior sign isolately without considering the whole combining sequence. This means that such occurences should be considered as syntax errors. If one really wants to create a Unicode-compliant XML/HTML document containing defective sequences, these sequences should be encoded with character entities... A XML/HTML code generator that generates a serialized document should then know the list of combining characters, and encode them with numeric entities when their use is defective (at the beginning of a CDATA section, or of an attribute value, or of a text element... This would completely "plug the hole". __________________________________________________________________ << ella for Spam Control >> has removed Spam messages and set aside Newsletters for me You can use it too - and it's FREE! http://www.ellaforspam.com
<<attachment: winmail.dat>>

