Re: Bug in JDOM char verification

Michael Smith Thu, 13 Nov 2003 16:42:06 -0800

Stefano Mazzocchi wrote:

look at this code:

    public static String checkCharacterData(String text) {
        if (text == null) {
            return "A null is not a legal XML value";
        }

        // do check
        for (int i = 0, len = text.length(); i<len; i++) {
            if (!isXMLCharacter(text.charAt(i))) {
                // Likely this character can't be easily displayed
                // because it's a control so we use it'd hexadecimal
                // representation in the reason.
                return ("0x" + Integer.toHexString(text.charAt(i))
                 + " is not a legal XML character");
            }
        }

        // If we got here, everything is OK
        return null;
    }

there is a big issue here: what about high-unicode characters like &0x10000; and above?

Slide fails to pass one of the litmus webdav compliance tests because of this.

Any suggestion on how to patch it? anyone?

As Julian and I described, the problem is likely to do with surrogate handling. I'd suggest something like this (note: I don't know the unicode surrogate stuff at all well - you'll have to look up the details):

for(...) {
if(!isXMLCharacter(text.charAt(i))) {
        // This character isn't a valid XML character. However, it
        // might be a valid initial surrogate.
        if(isSurrogate(text.charAt(i)) && isSurrogate(text.chatAt(i+1))
        {
                // Turns out this is ok, but we have to turn it back
                // into a valid single XML character on serialisation
                i++;
        }
        else
                return "Not a legal character";

}
}

What 'isSurrogate' should do is something I don't (yet) know.

Mike


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Bug in JDOM char verification

Reply via email to