[design]: grammar caching

neilg Tue, 08 Jan 2002 12:23:23 -0800


Hi folks,


Since Andy quite rightly points out the need to make XNI as stable as
possible before we roll out a production version of Xerces2, I thought
it might be a good time to bring up once again an issue that took a
lot of bandwidth in the summer but was never resolved--what should
grammar caching in Xerces2 look like?

Right now, we've got the rather ironic situation where you can do a
sort of grammar caching with Xerces2, but you've got to use DOM l3's
AS interfaces to do it.  The irony stems from the fact that if you
want performance badly enough to need grammar caching, you'd probably
rather use an event-based API for parsing rather than the DOM...

So, here goes.  I should note that this is in large measure based on
what Sandy proposed way back when (hope you don't mind too much being
associated with this post Sandy!  :-))

I should first note that this is a pretty heavy proposal.  It's
designed with a view to supporting the needs of the most advanced
applications that require the most flexibility.  I think this makes
sense because it's going to be advanced applications that need
Grammar caching to begin with, and in my view crafting a default
implementation for this proposal should be only somewhat more
difficult than developing one for a more scaled-down version.

Since grammar caches (or pools) outlive parser instances, the grammar
pool must be owned by the application.  It's for this reason that
retaining a GrammarResolver (or a GrammarBucket perhaps?) seems to
make sense--a GrammarResolver is owned by a
particular parser instance, and contains whatever subset of the
grammars in the grammar pool that the application, for whatever
reasons of its own, wishes to make available.  This kind of
architecture also allows the application to manage difficult
situations such as if the parser encounters an imported schema that
does not correspond to a schema with the same namespace that the
application already knows about.  Other sorts of use-cases might be a
webserver which deals with different kinds of XML documents, and, once
the type of a document is identified, wishes to enforce validation of
that document with only a subset of the grammars that it has in its
GrammarPool.  With the rise of ever more complex and integrated XML
and web services technologies, it seems to me that this kind of
functionality will have to be made available somwhwhere-and the
parser-level looks to me to be the appropriate place.  After all,
isn't Xerces2 about flexibility, modularity, and giving tools beyond
what are in standard API's to applications that need them?

Conceptually, the flow would look something like this:  An application
instantiates a certain parser Configuration, and associates its
GrammmarPool implemntation as a property of thatConfiguration.  When a
Validator object in the configuration begins to validate, it requests
that a bucket of grammars of the appropriate kind be filled by the
GrammarPool.  As it parses, the Validator takes grammars from the
bucket if it can, then gives the GrammarPool a chance to prvide the
grammar if it wishes, then the XMLEntityResolver gets a chance to
resolve the request to a file of the appropriate type.  At both these
stages, as much information is provided to the GrammarPool and the
XMLEntityResolver as is likely to prove at all helpful in identifying
the appropriate resource.  At the conclusion of parsing, the validator
will make available the contents of its bucket to the GrammarPool,
which can then determine whether to incorporate the grammars or ignore
them.  I tend to be of the view that parsing should be aborted with a
fatalError if something goes wrong in the grammar retrieval
process--e.g., if the GrammarPool gives a Validator a grammar of the
wrong type--but this is certainly a thorny and multifaceted question.

So, in addition to the interfaces given below, I'd propose to rename
XSGrammarResolver to XSGrammarBucket (sorry about the cheesy name...
:-)).  The same fate should probably befall the GrammarPool class that
XMLDTDValidator makes reference to.

public interface XMLGrammarPool {

    // we are trying to make this XMLGrammarPool work for all kinds of
    // grammars, so we have a parameter "grammarType" for each of the
methods.
    // It could be "schema", "dtd, etc., or it could be recast into an
    // integer.

    // retrieve the initial known set of grammars. this method is
    // called by a validator before the validation starts. the application
    // can provide an initial set of grammars available to the current
    // validation attempt.
    public Grammar[] retrieveInitialGrammarSet(String grammarType);

    // return the final set of grammars that the validator ended up
    // with.
    // This method is called after the
    // validation finishes. The application may then choose to cache some
    // of the returned grammars.
    public void cacheGrammars(String grammarType, Grammar[] grammars);

    // This method requests that the application retrieve a grammar
    // corresponding to the given GrammarDescription from its cache.
    // If it cannot do so it must return null; the parser will then
    // call the EntityResolver.  An application must not call its
    // EntityResolver itself from this method.
    public Grammar retrieveGrammar(String grammarType,
                                             XMLGrammarDescription desc);

} // XMLGrammarPool


public interface XMLGrammarDescription {
    public String getPublicID();
    public String getSystemID();
    public String getBaseURI();
} // XMLGrammarDescription

public class DTDDescription implements GrammarDescription {
    // used to indicate whether it's an internal or external DTD
    public final static int INTERNAL_DTD = 0;
    public final static int EXTERNAL_DTD = 1;

    public int getDTDType();

    // this returns the name of the root element if this is a DOCTYPE
    // entity, or the name of the entity if it's a standard entity
    // declaration.
    public String getEntityName();
}

public class XSDDescription implements GrammarDescription {
    // used to indicate what triggered the call
    // we don't include xsi:schemaLocation/noNamespaceSchemaLocation
    // because we'll defer the loading of schema documents until
    // a component from that namespace is referenced from the instance
    public final static int CONTEXT_INCLUDE   = 0;
    public final static int CONTEXT_REDEFINE  = 1;
    public final static int CONTEXT_IMPORT    = 2;
    public final static int CONTEXT_ELEMENT   = 3;
    public final static int CONTEXT_ATTRIBUTE = 4;
    public final static int CONTEXT_XSITYPE   = 5;

    public int getContextType();

    // for include and redefine, the namespace will be the target
    // namespace of the enclosing document. (or empty string?)
    public String getTargetNamespace();

    // for import and xsi:location attributes, it's possible to have
    // multiple hints for one namespace. so it's an array whose first
    // element will derive from the noNamespaceSchemaLocation or
    // schemaLocation property as the case of the targetNamespace may
    // be:
    public String[] getLocationHints();

    // If it's triggered by the document, the name of the
    // triggering component: element, attribute or xsi:type
    public QName getTriggeringComponent();

    // More information about "other location hint":
    // everything about the enclosing element
    public QName getEnclosingElementName();
    public XMLAttributes getAttributes();
}

/**
 * This interface is used to resolve external parsed entities. The
 * application can register an object that implements this interface
 * with the parser configuration in order to intercept entities and
 * resolve them explicitly. If the registered entity resolver cannot
 * resolve the entity, it should return <code>null</code> so that the
 * parser will try to resolve the entity using a default mechanism.
 *
 * @see XMLParserConfiguration
 *
 * @author Andy Clark, IBM
 *
 * @version $Id: XMLEntityResolver.java,v 1.2 2001/08/23 00:35:37 lehors
Exp $
 */
public interface XMLEntityResolver {

    //
    // XMLEntityResolver methods
    //

    /**
     * Resolves an external parsed entity. If the entity cannot be
     * resolved, this method should return null.
     *
     * @param desc:  contains a description for the type of entity
     *      (grammar, abstract schema) being sought.
     * @throws XNIException Thrown on general error.
     * @throws IOException  Thrown if resolved entity stream cannot be
     *                      opened or some other i/o error occurs.
     */
    public XMLInputSource resolveEntity(XMLGrammarDescription desc)
        throws XNIException, IOException;

} // interface XMLEntityResolver

Very much looking forward to some spirited, focused open-source
discussion!

Cheers,
Neil
Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[design]: grammar caching

Reply via email to