Joe Hildebrand wrote: > What do you mean by character? The operative question is, what does XML mean by character? Here http://www.w3.org/TR/2000/WD-xml-2e-20000814#dt-character seems to define the scope:
*** [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] [E67](see also [ISO/IEC 10646-2000]). Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646. [E69]The versions of these standards cited in A.1 Normative References were current at the time this document was prepared. New characters may be added to these standards by amendments or new editions. Consequently, XML processors must accept any character in the range specified for Char. The use of "compatibility characters", as defined in section 6.8 of [Unicode] [E67](see also D21 in section 3.6 of [Unicode3]), is discouraged.] Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ *** Perhaps you can parse that better than I can. > Do you have to perform some sort of canonicalization before counting? > Combining characters make this particularly difficult, which is why we > settled on something easy to describe and understand in JIDs. Right. It may not be easy to specify in XML schema, because the length of xs:string is length in characters as defined above. /psa
smime.p7s
Description: S/MIME Cryptographic Signature
