Hi Andrew,

EntityResolverWrapper is a wrapper for org.xml.sax.EntityResolver. The 
system ID passed to EntityResolver.resolveEntity() is the "expanded system 
ID". Specifically the docs for resolveEntity() [1] say: "if the system 
identifier is a URL, the SAX parser must resolve it fully before reporting 
it to the application" and that's exactly what the parser does. The other 
wrapper is for EntityResolver2 [2] whose resolveEntity() methods takes the 
literal system ID along with a base URI, so yes the two resolvers behave 
differently. Xerces has a utility class called 
org.apache.xerces.util.XMLCatalogResolver which uses the XML commons 
catalog resolver. You may want to have a look at it.

Hope that helps.

[1] 
http://www.saxproject.org/apidoc/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
[2] 
http://www.saxproject.org/apidoc/org/xml/sax/ext/EntityResolver2.html#resolveEntity(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

"Andrew Stevens" <[EMAIL PROTECTED]> wrote on 02/28/2005 08:47:52 AM:

> In 
> org.apache.xerces.util.EntityResolverWrapper.
> resolveEntity(XMLResourceIdentifier 
> resourceIdentifier), it has the line
>     String sysId = resourceIdentifier.getExpandedSystemId();
> Is there some particular reason this uses the expanded system ID rather 
than 
> using getLiteralSystemId()?
> 
> I've got a problem with some XML files I'm processing with Cocoon.  The 
> files all contain a DOCTYPE that uses a relative path for the system ID 
i.e. 
> <!DOCTYPE record SYSTEM "dcr4.5.dtd">  The documents are created by an 
> another application, and I can't affect what it puts in there.  Trying 
to 
> read the files generates a parser error since the DTD isn't present in 
the 
> directory containing the documents; no problem, I thought, just use a 
> suitable entry in the catalog used by Cocoon's EntityResolver.  So, 
> following the other entries, I added
>     SYSTEM "dcr4.5.dtd" "interwoven/dcr4.5.dtd"
> and copied the DTD into WEB-INF\entities\interwoven, however, it still 
> doesn't find the DTD.  Turning up the logging (and this is where it 
becomes 
> more relevant to Xerces than Cocoon, and why I'm asking here rather than 

> cocoon-user) I discovered that the system ID being passed in to the 
catalog 
> resolver already had the full path to the file, so it's not matching the 

> above entry in the catalog.  Since the path to the documents could be 
more 
> or less anything, I can't use a (prefix-based) rewrite entry in the 
catalog; 
> likewise it's impractical to include a system entry for every possible 
path, 
> since I don't know in advance what they're going to be.  Digging through 
the 
> Cocoon & Xerces source code, I discovered the path being received by the 

> catalog resolver has come from the EntityResolverWrapper i.e. the 
> resourceIdentifier.getExpandedSystemId() I mentioned above.  Presumably, 
if 
> that had used getLiteralSystemId() instead, the catalog resolver would 
have 
> received just "dcr4.5.dtd" for the system ID rather than the full path, 
and 
> would have matched it okay.  But I'm wary of changing it myself, since I 

> don't know what else might be affected (and I'd rather avoid using a 
> custom-built Xerces in our Cocoon app, to minimise the risk of 
introducing 
> other side-effects).
> 
> I notice in the current CVS HEAD, there's an EntityResolver2Wrapper 
class; 
> this one does use getLiteralSystemId(), in fact the latest CVS log 
message 
> on that class says
> "Fixing a bug. The systemId passed to EntityResolver2.resolveEntity may 
be 
> an absolute or relative URI. That is it should be the literal system 
> identifier, not the expanded one which resolved from the base URI."
> However, I also found an old (> 2 years) mailing list message 
> (http://mail-archives.apache.org/eyebrowse/ReadMsg?listName=xerces-
> [EMAIL PROTECTED]&msgId=568021) 
> which says that
> "The reason Xerces now returns fully-expanded URI's to the Entity 
resolver 
> is that SAX quite explicitly states that this is what XML processors are 

> supposed to do."
> So now I'm twice as confused.  Do the SAX2 Extensions 1.1 say that 
> EntityResolver2 should behave differently from EntityResolver?  Or have 
> things changed since EntityResolverWrapper switched to using 
> getExpandedSystemId(), and should it now be using getLiteralSystemId() 
after 
> all?
> 
> In the meantime I can work around my problem by plugging in a custom 
> EntityResolver which replaces any system IDs ending with "dcr4.5.dtd" 
with 
> just that string, before passing it on to the XML commons catalog 
resolver 
> as before.  But it'd be nice if it could be clarified how exactly 
Xerces' 
> wrapper classes are supposed to work, so I know if I should be raising a 
bug 
> :-)
> 
> 
> Andrew.
> --
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to