RE: xml names, webdav namespaces, and isUnicodeIdentifierPart()

Pill, Juergen Fri, 03 Aug 2001 17:49:58 -0700
Sounds great!

Juergen


 -----Original Message-----
From:   Shawn C. Dodd [mailto:[EMAIL PROTECTED]] 
Sent:   Thursday, August 02, 2001 16.26 PM
To:     [EMAIL PROTECTED]
Subject:        RE: xml names, webdav namespaces, and
isUnicodeIdentifierPart()

Hey Michael,

I've been looking at that code recently.  The company I work for is using
the Slide client library to talk to Microsoft Exchange 2000 (mostly for
address book and calendar functionality).  The code you refer to below has
been causing us some consternation, in that it doesn't seem to properly
handle multiple namespaces in the same request - besides the fact that it
doesn't properly separate name from namespace in all circumstances.  I'd
like to propose a solution to both problems at once.

Determining what characters are or aren't legal in element names isn't the
job of Slide - it's the job of an XML parser, and I think we should let a
parser do that job.  I propose that we augment your solution, below, with a
simple change.  Instead of accepting an enumeration of strings representing
the element's namespace/name pair, the propfind could optionally accept an
enumeration of objects each representing a property.

If, for the sake of argument, propfind checked the type of the objects in
the enumeration it's passed, it could handle Property objects specially.
The org.apache.webdav.lib.Property interface, as you know, contains separate
fields for namespace, name, and local name, so using Property obviates the
need to do the parsing in the Slide library.  The old-style enumeration of
strings would still be supported for backwards compatibility.

This solution wouldn't require a change in PropFindMethod's public
interface.

I'm totally open to other solutions as well.  I just wanted to get the ball
rolling...


Shawn

-----Original Message-----
From: msmith [mailto:msmith]On Behalf Of Michael Smith
Sent: Tuesday, July 31, 2001 9:25 PM
To: [EMAIL PROTECTED]
Subject: xml names, webdav namespaces, and isUnicodeIdentifierPart()

Hi all,

I just committed a little workaround for a problem (using the client
libraries) doing a propfind on any property with '-' in the property
name (current-user-privelege-set, specifically).

The library was doing this: to split a string between namespace
abbreviation and name, when looking at the element name, it went
backwards from the end, checking each character to see if it was a valid
part of an xml name. If it gets to the start, there's no namespace. If
it doesn't, then it can split into namespace and name. This is fine, but
the check for 'valid part of xml name' appeared broken.

It was calling Character.isUnicodeIdentifierPart(chr). The javadoc is
somewhat evasive on what _exactly_ this does (I assume it's fairly well
defined by the unicode standard, but I couldn't find the right
information in a look around unicode.org, and I don't have a copy of the
unicode book). The XML spec is explicit about exactly which codepoints
are allowed. Specifically, as well as several standard unicode character
classes (letters, digits, etc.), it gives '-', '.', and '_' as allowed.
isUnicodeIdentifierPart() rejects '-' (the javadoc says that _ is
allowed. I think (though I didn't test this explicitly) that '.' is also
allowed).

For now, I've changed this check to isUnicodeIdentifierPart(chr) ||
chr=='-'. This works, but seems terribly inelegent. I also suspect that
isUnicodeIdentifierPart() lets through some things that it shouldn't,
though I haven't checked into this properly.

Are there any unicode and/or xml experts on the list who could weigh in
with an opinion on what the correct check is, here? It's a minor detail,
really, but it's the sort of minor detail that has a tendency to bite
you later if you don't get it right the first time.

Michael
RE: xml names, webdav namespaces, and isUnicodeIdentifierPart()

Reply via email to