Armin Pfarr wrote: > > The org.apache.xerces.readers.XCatalog class uses an invalid XCatalog-DTD. > > 1) The following line in the DTD mixes CDATA and Default values > "Version CDATA (0.1 | 0.2) "0.2"" > This is incorrect according to the spec. You should replace it > with "Version (0.1 | 0.2) "0.2""
Good catch. This has been broken for quite awhile. The change for this is now in CVS. > 2) The definition "<!ELEMENT XCatalog ANY>" can also be considered > a bad habit. This change will be considered when we update the XCatalog implementation to the current version. Would you like to work on this code? We could really use the help. > 3) The declaration "<?xml version="1.0" encoding="US-ASCII"?>" > is a form of american imperialism, I personally don't like. > This little line forces me to use 7-BIT clean filesnames for > my Windows filesnames, because a [snip] The US-ASCII encoding refers to the contents of the DTD file and the DTD file *alone*. There are no European characters in the DTD, therefore I can specify the encoding as US-ASCII. And ASCII is the fastest encoding to scan because it overlaps values 0-127 in Unicode and no conversion is required of any characters. Even leaving the encoding unspecified and letting the parser default to UTF-8 slows down processing. Specifying an encoding in an XML document or entity, such as the DTD, has nothing to do with filenames in this case. Now, if *your* document instance referred to a DTD that had Latin 1 characters in the filename, then the encoding of the document should be specified as "ISO-8859-1". > 4) I would also suggest that you don't use mixed cases for the > attribute definitions. This is only a potential source of > errors, especially since the tag-names are not really > compound words We did not invent XCatalogs. (I believe that they're called XML Catalogs, now.) Therefore, we have no control over the case used in the element/attribute names. -- Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]