Ok, I've been doing some fixing up on this. Firstly, what I've done is
written a separate class, XalanParsedURI (can change the name if someone can
think of a better one). Roughly this goes:
class XalanParsedURI
{
public:
// accessors for:
scheme
authority
path
query
fragment
// utility
// crack a url into bits
void parse(various string types);
// put it back together again
XalanDOMString make() const;
// resolve relative to base
void resolve(const XalanParsedURI &base + various string types);
// helper for just doing a resolve
static XalanDOMString resolve(various string types);
};
(If you're not familiar with RFC2396's terminology scheme = protocol (http,
ftp, etc), authority = server, fragment = bookmark and the rest as usual).
Now I've had quite a long stare at URISupport. My reading of it is:
class URISupport
{
typedef XalanAutoPtr<XMLURL> URLAutoPtrType;
static URLAutoPtrType getURLFromString(various types and methods of
calling);
Used to get a Xerces type reference presumably for calling the Xerces
parser
getURLStringFromString(one string argument)
Used when there is no base context
getURLStringFromString(two string arguments)
Used when there is a base context
NormalizeURIText(various methods)
Used to flip slashes around (\ -> /)
constants for file://, file:///
};
A few gripes. Firstly, the naming. The names are inconsistent (URI/URL).
Secondly, they don't tell you much (took me ages to figure out quite what
was going on); e.g. getURLStringFromString really means 'resolve relative to
base or application context' (more on that in a second). NormalizeURIText
just flips slashes, so could be called flipSlashes or something along those
lines (fixUpWindowsJunk?).
Next, we've got a wrapper for the URISupport class (in which all of the
functions are static anyway) in StyleSheetExecutionContext. I can't really
see the point of this unless they are going to help use the URL of the
stylesheet as a base for resolving relative URIs (which they don't).
The implementation of the no base context/base context is a little weird
too. It looks to me like we've got:
- Has a known scheme, all OK
- If not, assume OS file spec and use scheme "file:". Do something OS
specific to get the full path. If it starts with a slash prepend file:// or
if not prepend file:///.
What is essentially happening here is that the application is given a base
context of "file:///". If you look at the RFC, section 5.1 we are in the
outside box of the little hierachy picture. So this is just a resolve with
the base context file:/// - nothing else needed. Note here that file:/// is
resolver specific -- what goes in the file protocol is up to the resolver.
Now we've got a couple of hacks:
- Unknown schemes are parsed as no scheme at all
- An OS specific function is being called to get the "full path" of the
file
- Slashes are being flipped around
I believe this bunch are designed so that people can do "c:\foo.xml" on
windows (no scheme/flips); or "foo.xml" and have that resolved to the
working directory (fullpath thing).
I would be entirely in favour of dropping these as they just seem like poor
hacks at best. If the support needs to be there, then I would suggest that
it is by:
- including the ability to set an application base context
- having that default to the local file system (which is per OS and
therefore -ugh- but still, you might work this by calling Xerces'
getFullPath(".") for example).
- such things would probably be done in the global Initialise() function
This still drops the windows slashes hack, but that's just ugly.
If all that is done, then there is really nothing in URISupport at all (all
that is needed is a single XalanParsedURI with the default context; this
extends to the StylesheetExecutionContext by storing one in that --
resolution is then done relative to the current stylesheet). In fact, the
only thing remaining is the code that sets up XMLURL (getURLFromString).
Dropping URISupport would break a bunch of existing code though (i.e. 3rd
party apps); not sure how much need there is for API compatibility.
I also remember seeing something about hooking the URI munging code; I guess
if going for any of the above, then this might be a nice time to do it. My
brain just turned off (and this is long enough as it is) so I'll hold off on
that to see if it's still a worthwhile thing to consider.
(Oh, and if not going for any of the above, it's a trivial task to simply
call resolve from getURLStringFromString(two arguments)).
Any opinions would be good!
Thanks,
Mark
p.s. bug for this is
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=16737, but I assigned it
to myself so it doesn't seem to have sent any mail to the list ;)