RE: EntityResolver not allowed to do enough

David N Bertoni/Cambridge/IBM Tue, 22 Oct 2002 12:41:19 -0700



> > Not right now, but that may be why Xalan-J created the notion of a
> > URIResolver.
>
> > Again, I think the job of a URIResolver, although I don't know how
> this
> > fits in with EntityResolvers, because I don't think URIResolvers are
> > only responsible for string manipulation of URIs -- I think they
> > actually do that, _plus_ do the work of an EntityResolver.  In
> > other words, they return an InputSource.
>
> So, if the user installs a URIResolver, they must calculate the complete
> URL themselves or choose not to create the InputSource. That's a bit
> annoying but good enough for me.

Actually, the idea is you don't even have to create a URL, you can just
return an input source.  This works for situations where the URI isn't
resolved against any base URI, and let's people just install a URIResolver
instead of a URIResolver and an EntityResolver.

>
> If the user is responsible for URI resolution, are they also responsible
> for maintaining the base URI? Is the base URI just the URI of the
> "parent" or the thing that we are resolving or must it be normalized in
> some way e.g.
>
> document('http://xml.apache.org/foo/foo.xml')
>
> Is the base for relative document calls now
> <http://xml.apache.org/foo/foo.xml> or <http://xml.apache.org/foo/>?
>
> I ask because generating base URIs might be difficult for custom
> schemes.

Relative URIs for the document() function are resolved differently
according to the number of arguments provided:

   http://www.w3.org/TR/xslt#document

It's pretty complicated, but in your example, the URI is absolute, so it's
never resolved against any base URI.  The normalization aspect is important
because we use the fully normalized URI as a key to store the parsed
document.  That's because we must return the same document if the
document() function is called with the same argument.  Also there is state
maintained between calls to the document function, so one call will not
affect other calls.

If we implement URIResolver, there's no need for the URIResolver instance
to maintain anything.  The only rule is that it must behave
deterministically during a particular transformation (so we don't parse the
document a second time).  That's why the URIResolver has to provide a "URI"
that we can use as a key for the document.  You can think of URIResolver as
EntityResolver + "thing that can deal with custom URI schemes, etc."  Does
that make sense?

> > This should only happen if the URI has the file scheme or doesn't have
> a
> > scheme, in which case we assume it's a file URL.  If we're trying to
> do
> > the realpath() thing when there's a scheme present, then that's a bug.
>
> In both XMLURL and URISupport, unrecognized schemes (i.e. schemes other
> than http, ftp and file) are considered to be no scheme at all. That was
> the original problem that I reported:
>
> scheme:foo/bar => file://<base-path>/scheme:foo/bar
>
> On Linux, that will explode due to realpath. On Windows you can at least
> try to extract your original URL amongst the garbage.

OK, then that's a bug, and we should fix it.  We should only attempt
realpath() on URIs with file protocol.

> Of course, XMLUri is totally different, as one would expect given its
> totally different name :-)
>
> > OK, now I'm _really_ offended!  ;-)
>
> I doubt it :-)

OK, so I'm really not...

> Do you really understand the relationship between the various URI
> processing systems?

I think I do, but I remember being wrong once or twice before, so I might
be wrong again... ;-)

Let's talk more about how to proceed.  I think the best way is with a new
URIResolver class that gets a go at the URI before we try doing all the URI
normalization code and using the EntityResolver.  How does that sound?

Dave
RE: EntityResolver not allowed to do enough

Reply via email to