org.apache.xerces.util.XMLCatalogResolver looks interesting - I see it has a useLiteralSystemId property, but that only gets used by resolveIdentifier/resolveEntity(XMLResourceIdentifier) and not by resolveEntity(String,String) i.e. only the ones that throw XNIException, so I'm not sure it would help me in this instance. Besides, given the way Excalibur componentises EntityResolver, I suspect my current plan (subclassing their DefaultResolver, which already uses XML Commons' CatalogResolver) will be quicker than trying to plug the XMLCatalogResolver into it. Something to play around with if I have time on my hands, though.
Andrew.
From: Michael Glavassevich <[EMAIL PROTECTED]> Reply-To: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: EntityResolverWrapper Date: Mon, 28 Feb 2005 09:16:46 -0500
Hi Andrew,
EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The system ID passed to EntityResolver.resolveEntity() is the "expanded system ID". Specifically the docs for resolveEntity() [1] say: "if the system identifier is a URL, the SAX parser must resolve it fully before reporting it to the application" and that's exactly what the parser does. The other wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the literal system ID along with a base URI, so yes the two resolvers behave differently. Xerces has a utility class called org.apache.xerces.util.XMLCatalogResolver which uses the XML commons catalog resolver. You may want to have a look at it.
Hope that helps.
[1] http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String) [2] http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)
"Andrew Stevens" <[EMAIL PROTECTED]> wrote on 02/28/2005 08:47:52 AM:
> In > org.apache.xerces.util.EntityResolverWrapper. > resolveEntity(XMLResourceIdentifier > resourceIdentifier), it has the line > String sysId = resourceIdentifier.getExpandedSystemId(); > Is there some particular reason this uses the expanded system ID rather than > using getLiteralSystemId()? > > I've got a problem with some XML files I'm processing with Cocoon. The > files all contain a DOCTYPE that uses a relative path for the system ID i.e. > <!DOCTYPE record SYSTEM "dcr4.5.dtd"> The documents are created by an > another application, and I can't affect what it puts in there. Trying to > read the files generates a parser error since the DTD isn't present in the > directory containing the documents; no problem, I thought, just use a > suitable entry in the catalog used by Cocoon's EntityResolver. So, > following the other entries, I added > SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd" > and copied the DTD into WEB-INF\entities\interwoven, however, it still > doesn't find the DTD. Turning up the logging (and this is where it becomes > more relevant to Xerces than Cocoon, and why I'm asking here rather than
> cocoon-user) I discovered that the system ID being passed in to the catalog > resolver already had the full path to the file, so it's not matching the
> above entry in the catalog. Since the path to the documents could be more > or less anything, I can't use a (prefix-based) rewrite entry in the catalog; > likewise it's impractical to include a system entry for every possible path, > since I don't know in advance what they're going to be. Digging through the > Cocoon & Xerces source code, I discovered the path being received by the
> catalog resolver has come from the EntityResolverWrapper i.e. the > resourceIdentifier.getExpandedSystemId() I mentioned above. Presumably, if > that had used getLiteralSystemId() instead, the catalog resolver would have > received just "dcr4.5.dtd" for the system ID rather than the full path, and > would have matched it okay. But I'm wary of changing it myself, since I
> don't know what else might be affected (and I'd rather avoid using a > custom-built Xerces in our Cocoon app, to minimise the risk of introducing > other side-effects). > > I notice in the current CVS HEAD, there's an EntityResolver2Wrapper class; > this one does use getLiteralSystemId(), in fact the latest CVS log message > on that class says > "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may be > an absolute or relative URI. That is it should be the literal system > identifier, not the expanded one which resolved from the base URI." > However, I also found an old (> 2 years) mailing list message > (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces- > [EMAIL PROTECTED]&msgId=568021) > which says that > "The reason Xerces now returns fully-expanded URI's to the Entity resolver > is that SAX quite explicitly states that this is what XML processors are
> supposed to do." > So now I'm twice as confused. Do the SAX2 Extensions 1.1 say that > EntityResolver2 should behave differently from EntityResolver? Or have > things changed since EntityResolverWrapper switched to using > getExpandedSystemId(), and should it now be using getLiteralSystemId() after > all? > > In the meantime I can work around my problem by plugging in a custom > EntityResolver which replaces any system IDs ending with "dcr4.5.dtd" with > just that string, before passing it on to the XML commons catalog resolver > as before. But it'd be nice if it could be clarified how exactly Xerces' > wrapper classes are supposed to work, so I know if I should be raising a bug > :-) > > > Andrew.
Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED]
--
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]