look at this code:
public static String checkCharacterData(String text) { if (text == null) { return "A null is not a legal XML value"; }
// do check for (int i = 0, len = text.length(); i<len; i++) { if (!isXMLCharacter(text.charAt(i))) { // Likely this character can't be easily displayed // because it's a control so we use it'd hexadecimal // representation in the reason. return ("0x" + Integer.toHexString(text.charAt(i)) + " is not a legal XML character"); } }
// If we got here, everything is OK return null; }
there is a big issue here: what about high-unicode characters like &0x10000; and above?
Slide fails to pass one of the litmus webdav compliance tests because of this.
Any suggestion on how to patch it? anyone?
As Julian and I described, the problem is likely to do with surrogate handling. I'd suggest something like this (note: I don't know the unicode surrogate stuff at all well - you'll have to look up the details):
for(...) {
if(!isXMLCharacter(text.charAt(i))) {
// This character isn't a valid XML character. However, it
// might be a valid initial surrogate.
if(isSurrogate(text.charAt(i)) && isSurrogate(text.chatAt(i+1))
{
// Turns out this is ok, but we have to turn it back
// into a valid single XML character on serialisation
i++;
}
else
return "Not a legal character";} }
What 'isSurrogate' should do is something I don't (yet) know.
Mike
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
