Hello, The following comments represent a consensus of the developers from IBM involved with Xerces-J. Version.java, AbstractVersion.java, VersionImpl.java: The design of the means of querying version information about JAXP causes us considerable concern. As currently specified, we're worried that the information returned by these classes may not correspond with the objects actually returned by the various JAXP factories, because of the differences between the factory mechanisms employed here and those in the other factories. Other problems with the design include the fact that, often--and in the reference implementation in particular--different products implement the transform and parser portions of the API. Hence, it's far from clear what a Version.getImplementationTitle should return in the current framework: should it be Xerces or Xalan for the reference implementation? We propose that a getVersion method should be included on each of the objects manufacturable by JAXP (e.g., DocumentBuilder and SAXParser). This way, each component of the underlying implementation could specify independently what vendor implemented it and what version of JAXP it implements. This would allow the elimination of both VersionImpl and AbstractVersion, cleaning up the API and making the currently somewhat inscrutable versioning mechanism easy for all to understand. We also think that not all the methods on the current Version class are necessary. For instance, we aren't aware of any compelling reason to preserve getExtensionName. getImplementationVendorId() seems even more difficult to justify. SecureProcessing: We believe that this class is not useful. It's clear that there are certain constructs in XML that may cause problems for certain parser and/or transformer implementations. But it is also clear that this set will vary from processor to processor, and hence, it simply isn't possible to provide a generic, customizable means for applications to set all parameters relating to security in all implementations. Therefore, we don't see any value in cluttering the API with an additional class; rather, we believe that, in any class where certain implementations may exhibit security problems for certain XML constructs, a feature should be provided to force the implementation to process those consturcts in a secure, if non-conformant, manner. The precise semantics of this feature will necessarily have to vary from implementation to implementation; but this is inescapable given the purpose at issue. The problems that can result are manifested in the current state of the code: Entity expansion limit and methods which handle it are underspecified in the spec. In Xerces we define it as "the number of entity expansions that the parser should permit in the document". JAXP provides no definition. It's meaning shouldn't be left to the imagination. The class' default constructor sets 'reasonable' processing values. What is "reasonable" will surely vary with the implementation--hence, the spec itself cannot make any claims to provide this in a default class that's part of the API and knows nothing of any implementation details; nor can any application that wishes to remain implementation-agnostic meaningfully set this field. The entity expansion limit in the spec is 100. Xerces' is currently 100,000. Doubtless there are many reasonable documents with more than 100 entity references which don't consume an unreasonable amount of resources in the vast majority of implementations. Feedback from other implementors leads us to believe that the max occurs exploit for an element in a schema is really not inherent to the processing of schema, but is instead the product of implementation choices. Hence, the spec should not provide any means to limit this that all implementations are then required to support. In any event, it should also be pointed out that in the description of javax.xml.SecureProcessing, occurrence and occurrences are mispelled in several places as occurance and occurances. maxOccurs is incorrectly referred to as maxOccur. XMLConstants: XML_DTD_NS_URI: change description to say what DOM level 3 does in a similar context (see http://www.w3.org/TR/2003/CR-DOM-Level-3-Core-20031107/core.html#parameter-schema-type), rather than speaking of arbitrary values. XMLUtils: Overall, we think this class is not useful and should be eliminated. The need to deal with both XML 1.0 and 1.1 in this class makes it unwieldy; but, why are NCNames called out specifically? Is it not just as reasonable (or, we think, no more unlikely) that a user would be interested in Names, NameStartChars etc.? javax.xml.validation: Our feeling is that this specification needs to confine itself to API's, and that it should leave implementation details entirely up to the implementors. Even if it appears to the authors that all implementations will need to do the same thing in particular places, this should still be left to implementations to encourage the maximum amount of innovation. With this in mind, we believe AbstractSchema--and the package-protected classes that it relies upon--should be removed. By the same token, it seems to us that the existing factory mechanism is far too complex, and that the model used in the XPath package is much more appropriate. That is, SchemaFactoryFinder should be made private and SchemaFactoryLoader should be removed. We also noticed that newInstance methods are not consistently labelled final: While XPathFactory.newInstance() is labeled final, SchemaFactory.newInstance() is not. Turning to the TypeInfoProvider class: getElementTypeInfo() states that "the caller can keep references to the returned TypeInfo longer than the callback scope". We believe that this puts the performance onus the wrong way round: The primary use for this interface, we feel, is in the SAX world; the tradition in SAX is that applications should copy the information they wish, since all objects passed to the application are owned by the parser. Applications wishing to generate a DOM from these callbacks will need to pay the cost of building the DOM, and we think that obliging them to copy information from the TypeInfo objects so the parser may reuse them is a comparatively small price to pay. The DOM level 3 spec also defines isId()on Attr nodes; javadocs for the isIdAttribute() method should refer to this information. Also, the javadoc should probably state that isId can only be understood with reference to whatever grammar specification was used in the production of the given Schema. It seems many places in this package say "should" or "should be null" wen "must" would be preferable. For example, SchemaFactory's protected constructor says that "derived classes *should* create SchemaFactory objects that have [a] null ErrorHandler and null LSResourceResolver"; clearly it would be best for interoperability if this were changed to "must". The same holds for Validator and ValidatorHandler. We have a strongg sense that users will be surprised by the draconian error handling behaviour when no error handling implementation is attached to a Validator, since this behaviour differs from that specified in the rest of the API. This should be called out emphatically in the spec. 3.3.10.1: should specify what happens if end up having two schemas with same targetNamespace. Similarly, the behaviour to be expected for this edge-case should be spelled out for the JAXP 1.2 properties. Additionally, whether the ordering of the schema documents themselves is sigifnicant should be specified. That is, if schema A with target namespace tnsA imports schema B with target namespace tnsB, does an implementation's behaviour have to be identical if they are specified as {A,B} and {B,A}? Currently in Xerces, the latter will cause the preparsed B to be used; the former will generate a request for B to be resolved. Also, reference to 4.2 of the schema spec should be dropped since that section does not discuss this kind of processor- (or processor API-) defined behaviour. The same section has text to the effect that if no error handler is registered, no error will be reported to the error handler and an exception gets thrown. This circularity should be cleaned up. Near the beginning of the ValidatorHandler javadoc, it is stated that "[s]imilarly, the user-specified callback will receive non-null strings for all three parameters". It is ambiguous whether the "user-specified callback" is the ContentHandler registered on the ValidatorHandler, or whether it's the callback that is made to the ValidatorHandler. It seems that the ValidationHandler's behaviour when the namespace-prefixes feature is set to false is underspecified. will namespace attributes be removed if they're present? If they are removed, then the proscription that a "ValidatorHandler may not remove attributes that were present in the input" should be modified so that namespace attributes are specifically not counted as attributes. Are they added when necessary and the start/endPrefixMapping callbacks are still not issued? Namespace attributes, if sent, would be redundant; it seems that the input a ValidationHandler can expect should be clarified. Our suggestion is that applications calling ValidationHandlers should pass namespace information in start/endPrefixMapping methods. 8.3.7: mentions supporting DTD's; "must" should not be used in this context, since DTD support is not required by JAXP 1.3 in the validation API. ValidatorHandler.isValidSoFar(): We believe that an application should be able to determine this trivially--especially with the very strong incentive provided for the registration of custom error reporters. It does not make sense to oblige parsers to always compute somethihng that only a small number of applications will need, when it would be trivial for the applications to perform the computation. ignoreableWhitespace (8.3.9), 4th bullet reads to the effect that if certain characters are determined to be ignorable, then ignoreableWhitepsace callback should be invoked. "Should" should be replaced by "must", and the fact that this is only meaningful for DTD's should be clearly stated; perhaps a link to the Infoset's [element content whitespace] property should be provided. Also, this need not be justified with reference to DocumentBuilders. javax.xml.parsers: if setSchema is used, the reader's attention should be emphatically called to the fact that errors will generate exceptions if no error handler is registered. The setSchema() methods for both DocumentBuilderFactory and SAXParserFactory state that "[w]hen a Schema is non-null, a parser will use a validator created from it to validate documents before it passes information on to the application". We are strongly of the view that this unacceptably restricts implementors' freedom: This should be reworded to read "[w]hen a Schema is non-null, a parser will behave as if it used a validator created from that Schema to validate documents before it passes information on to the application". This will allow an implementor to use custom, possibly dramatically more efficient, behaviour if the Schema object registered with the Factory is compatible with the parser objects at a level lower than JAXP. It is not clear to us why it should not be possible to register DOM level 3 error reporters/entity resolvers with a DocumentBuilder. If this were done, of course, how the new DOM l3 entity resolvers/error resolvers would interact with old ones would need to be specified. Further, if seems to us as if it might be useful for a DocumentBuilderFactory to be able to return LSParsers and LSSerializers; since the ability to set Schemas on DocumentBuilderFactories is not duplicated in the DOM, it seems likely that this functionality would be highly useful to the community. javax.xml.datatype: 1.3.15: adding two durations: normalized is undefined. It appears to have a connection with the normalizeWith(Calendar) method, but this should be made clear. Normalize is also not mentioned in subtract; this is at least inconsistent. Since normalizeWith appears to return a normalized duration, there should be a method (isNormalized?) that tells an application whether the Duration object has been normalized with respect to some Calendar. General: The javax.xml.(parsers|transform).FactoryFinders are at very considerable variance with what's been in xml-commons for a long time; since this represents the product of much debugging from experience in the field, should the reference implementation not leverage he Apache code? It would also seem prudent for all the factories to leverage a common mechanism for performing actions like finding classLoaders (i.e., what order to consult Context vs. system vs. bootstrap classloaders etc.) Cheers! Neil Neil Graham XML Parser Development IBM Toronto Lab Phone: 905-413-3519, T/L 969-3519 E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
