Re: [Xerces2] How do we want Grammar Caching

Kohsuke KAWAGUCHI Fri, 07 Sep 2001 16:59:52 -0700

Hi, all. It might be too late, but please allow me to propose my idea.




I think the grammar caching actually has two layers.

One layer, which is lower, should address the same issue that
EntityResolver addresses. Namely, to resolve the actual location of the
file.

I feel that EntityResolver is not quite satisfactory because it is for
DTD. It takes public id and system id, but public id doesn't make sense
for other schema languages.

So I suggest

interface GrammarResolver {

  XMLInputSource resolveGrammar(
  
    // perhaps the namespace URI of the schema language
    String schemaLanguage,
    
    // the namespace URI of the grammar.
    // note that for some schema language,
    // this doesn't make sense at all.
    // maybe we can pass Public ID if this interface
    // is used for DTD. For SOX, we would pass URN.
    String namespaceURI,
    
    // the location specified in the document.
    // In some schema languages,
    // sometimes there is no location specified.
    String systemID );
}

I think this interface works with DTD, XML Schema, SOX, RELAX NG and
TREX.




The second layer, which is at a higher level, addresses the issue of the
recompilation overhead. That is, achieve higher performance by keeping
the compiled grammars at some place.

As for this layer, I know I'm just showing my inability, but to me, Sandy's
proposal seems to be complicated. It's probably because I can't see why
such a fine level caching is necessary.


Let's take an example. Say I'm developing a server that

1. accepts MathML documents from customers.
2. prints them out nicely by an expensive printer.
3. send them back to the customer by USPS.

In this kind of application, you have very limited types of input
documents. I believe 99% of the applications will fall into this
category, because without assuming the input document, you cannot do
much useful thing.

Now as a developer of this server, all I need is to cache a grammar of
one schema file. That file is probably placed in a jar file.

To me, it is very natural to compile a schema before I read any XML
document. So rather than making the parser callback my application, I
want to specify a grammar to the parser *before* the validation starts.


This is parallel to the design of the javax.xml.transform package. You
have a compiler (TransformerFactory), a schema (Templates) and a
validator (Transformer).

For this, Xerces only need to expose the schema compiler.


I think the following interface works nicely for this purpose:

http://iso-relax.sourceforge.net/apiDoc/org/iso_relax/verifier/package-summary.html




Let's take another example. This covers 1% of the application, which
does not assume anything about its input document. I hardly imagine
those use cases, so I'd appreciate if someone provides me better
examples.

Now say I'm developing a XSLT server, which accepts an XML document and
an XSLT stylesheet. My server transforms it and then send the result back
to the client.

Since I have no idea what kind of XML documents I receive, the above
"grammar first" approach doesn't work [1].

I don't think the developers of these applications want to cache
grammars because they too have no idea where those grammars come from.
Since one cannot even expect namespace->grammar relationship, how can
they cache grammars?

It seems to me that Sandy's interface tries to address this kind of the
applications, but I think there is no such application.


------------------
[1] By the way, if I have no idea what kind of XML documents I receive,
then why do I want to validate them at the first place? That's why I
can't think of any use case. How others think?


regards,
--
Kohsuke KAWAGUCHI                          +1 650 786 0721
Sun Microsystems                   [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] How do we want Grammar Caching

Reply via email to