Thanks, I think I "get it" now. It looks like EntityResolver2 is the new & improved version to fix some of the limitations of EntityResolver (like the one I'm running up against). But given that the Version in EntityResolver2's javadocs is "TBD" and the Xerces wrapper isn't in any released version yet (at least, I assume the Xerces-J_01052005 CVS tag is a between-versions build?), I won't hold my breath waiting for Excalibur and then Cocoon to start using the new version.

org.apache.xerces.util.XMLCatalogResolver looks interesting - I see it has a useLiteralSystemId property, but that only gets used by resolveIdentifier/resolveEntity(XMLResourceIdentifier) and not by resolveEntity(String,String) i.e. only the ones that throw XNIException, so I'm not sure it would help me in this instance. Besides, given the way Excalibur componentises EntityResolver, I suspect my current plan (subclassing their DefaultResolver, which already uses XML Commons' CatalogResolver) will be quicker than trying to plug the XMLCatalogResolver into it. Something to play around with if I have time on my hands, though.


Andrew.

From: Michael Glavassevich <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: EntityResolverWrapper
Date: Mon, 28 Feb 2005 09:16:46 -0500

Hi Andrew,

EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The
system ID passed to EntityResolver.resolveEntity() is the "expanded system
ID". Specifically the docs for resolveEntity() [1] say: "if the system
identifier is a URL, the SAX parser must resolve it fully before reporting
it to the application" and that's exactly what the parser does. The other
wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the
literal system ID along with a base URI, so yes the two resolvers behave
differently. Xerces has a utility class called
org.apache.xerces.util.XMLCatalogResolver which uses the XML commons
catalog resolver. You may want to have a look at it.

Hope that helps.

[1]
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2]
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

"Andrew Stevens" <[EMAIL PROTECTED]> wrote on 02/28/2005 08:47:52 AM:

> In
> org.apache.xerces.util.EntityResolverWrapper.
> resolveEntity(XMLResourceIdentifier
> resourceIdentifier), it has the line
>     String sysId = resourceIdentifier.getExpandedSystemId();
> Is there some particular reason this uses the expanded system ID rather
than
> using getLiteralSystemId()?
>
> I've got a problem with some XML files I'm processing with Cocoon.  The
> files all contain a DOCTYPE that uses a relative path for the system ID
i.e.
> <!DOCTYPE record SYSTEM "dcr4.5.dtd">  The documents are created by an
> another application, and I can't affect what it puts in there.  Trying
to
> read the files generates a parser error since the DTD isn't present in
the
> directory containing the documents; no problem, I thought, just use a
> suitable entry in the catalog used by Cocoon's EntityResolver.  So,
> following the other entries, I added
>     SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> and copied the DTD into WEB-INF\entities\interwoven, however, it still
> doesn't find the DTD.  Turning up the logging (and this is where it
becomes
> more relevant to Xerces than Cocoon, and why I'm asking here rather than

> cocoon-user) I discovered that the system ID being passed in to the
catalog
> resolver already had the full path to the file, so it's not matching the

> above entry in the catalog.  Since the path to the documents could be
more
> or less anything, I can't use a (prefix-based) rewrite entry in the
catalog;
> likewise it's impractical to include a system entry for every possible
path,
> since I don't know in advance what they're going to be.  Digging through
the
> Cocoon & Xerces source code, I discovered the path being received by the

> catalog resolver has come from the EntityResolverWrapper i.e. the
> resourceIdentifier.getExpandedSystemId() I mentioned above.  Presumably,
if
> that had used getLiteralSystemId() instead, the catalog resolver would
have
> received just "dcr4.5.dtd" for the system ID rather than the full path,
and
> would have matched it okay.  But I'm wary of changing it myself, since I

> don't know what else might be affected (and I'd rather avoid using a
> custom-built Xerces in our Cocoon app, to minimise the risk of
introducing
> other side-effects).
>
> I notice in the current CVS HEAD, there's an EntityResolver2Wrapper
class;
> this one does use getLiteralSystemId(), in fact the latest CVS log
message
> on that class says
> "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may
be
> an absolute or relative URI. That is it should be the literal system
> identifier, not the expanded one which resolved from the base URI."
> However, I also found an old (> 2 years) mailing list message
> (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> [EMAIL PROTECTED]&msgId=568021)
> which says that
> "The reason Xerces now returns fully-expanded URI's to the Entity
resolver
> is that SAX quite explicitly states that this is what XML processors are

> supposed to do."
> So now I'm twice as confused.  Do the SAX2 Extensions 1.1 say that
> EntityResolver2 should behave differently from EntityResolver?  Or have
> things changed since EntityResolverWrapper switched to using
> getExpandedSystemId(), and should it now be using getLiteralSystemId()
after
> all?
>
> In the meantime I can work around my problem by plugging in a custom
> EntityResolver which replaces any system IDs ending with "dcr4.5.dtd"
with
> just that string, before passing it on to the XML commons catalog
resolver
> as before.  But it'd be nice if it could be clarified how exactly
Xerces'
> wrapper classes are supposed to work, so I know if I should be raising a
bug
> :-)
>
>
> Andrew.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]
--



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to