If you know which schemas are going to be used, I would *strongly* recommend preparsing the schemas and caching them. A good FAQ page [1] describes how to accomplish the task. This will greatly improve the performance of your application, because it's expansive to parse the same set of schemas again and again whenever an XML document needs to be validated.
As we all know, EntityResolver is from SAX, and SAX requires the system ID to be absolutized before the custom EntityResolver is called. And by default, Xerces thinks that all locations are files. (Is there a better *default* behavior?) Applications can change this behavior by registering an EntityResolver. Since we are required to absolutize system ID's, we have to have a base for that. We do try our best for a meaningful base, but if that fails, there is no better choice than the user.dir property. If you don't like it, you can ignore whatever comes before the last '/' in the system ID. [1] http://xml.apache.org/xerces2-j/faq-grammars.html Cheers, Sandy Gao Software Developer, IBM Canada (1-905) 413-3255 [EMAIL PROTECTED] "Milan Trninic" <[EMAIL PROTECTED] To: <[EMAIL PROTECTED]>, <[email protected]> nc.com> cc: Subject: Re: problem: xsd schemas that are included or imported must be files ! 10/31/2002 11:19 AM Please respond to xerces-j-user I had the same problem. In the xerces API it says that the parser is required to resolve the id to a meningful URI before passing it to the application (custom EntityResolver). I was trying to find a way to intercept that, but due to the lack of time gave up. I would certinly like to see if there is a way to do it. Otherwise, the only solution I see is to preparse all the schemas and provide them as external schemas to the parser. Milan ----- Original Message ----- From: Evyatar Kafkafi To: [email protected] ; [EMAIL PROTECTED] Sent: Thursday, October 31, 2002 5:24 AM Subject: problem: xsd schemas that are included or imported must be files ! Hi all, My problem might be a feature of Xerces, or a feature of XML schema - I am not sure. In any case, I think this feature is wrong. In a schema like this: <?xml version="1.0" encoding="UTF-8"?> <xsd:schema targetNamespace="http://schemas.devxpert.com/order" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:my="my_imported_namespace" xmlns:dx="http://schemas.devxpert.com/order" elementFormDefault="qualified"> <xsd:import namespace="my_imported_namespace" schemaLocation ="import.xsd"/> ... </xsd:schema> When the parser parses the xml:import, it takes the schemaLocation and assumes it is a file on the file system ! Now this approach is quite reasonable when you work with small systems, but not on large scale. In the system I'm currently developing, we don't keep the schemas in file system at all ! We keep XML schemas in the Database. There are many good reasons for this. When we use a parser to parse an XML instance and validate it with a schema, we take the schema (and all its imported and included xsd files) directly from the Database. We don't want to write them to the file system and then tell the parser where they are located on the disk. Xerces supports this behavior by using the InputSource class, which allows the input to come from a string, not from a file. In addition, Xerces has the Entity Resolver, which allows me to supply the parser an Input Source for the entity "import.xsd". But before calling my custom made entity resolver, Xerces tries to expand the value "import.xsd" to a meaningfull URI, and its last resort is to assume it's in the "user.dir" directory [System.getProperty("user.dir")]. So you see, Xerces assumes that "import.xsd" is a file. Why is that? Is it Xerces, or is it defined by the XML Schema Recomendation? And in any case, could it be changed? Isn't it much more general (and thus elegant) to allow for entities to come not only from file system? Evyatar. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
