Hi,
My comments use following terms:
GrammarCache allows to store known grammars.
GrammarResolver resolves grammar location (not in cache).
> For example, a grammar for namespace N is already known to the parser, then
> it sees an element from N. Does the parser use that known grammar, or does
> it ask the application for another one? Another example: A imports B and C,
> then both B and C import D. Does the parser ask the application for D twice
> (which might result in two different D's, and note that it would be an
> error if the two D's have element declarations of the same name)?
What is a grammar already known to parser? Parser asks for *all* grammars
at first GrammarCache does not it? It is not responsibitity of parser to
know if grammar is already cached or not.
> For an xsi:schemaLocation/noNamespaceSchemaLocation attribute,
> a. if a grammar for such namespace is already known to the parser, then
> that grammar is used;
> b. otherwise, the application will be given an opportunity to provide the
> grammar object for that namespace;
> c. if a grammar is not provided from (b), then the application can override
> the schemaLocation hint.
It is a work for a GrammarResolver.
> I don't know DTD as much as folks do to come up with a good algorithm. So I
> need help from you guys again. We might need to consider the following
> questions.
> a. How to cache internal DTD subset vs. external DTD subset;
>From my point of view it does not make sence to cache internal DTD as
it is per document. It makes sence to cache just external entities
referenced from it.
> b. What's the minimum unit of DTD grammar to cache? An internal subset? An
> external subset (corresponding to a .DTD file)? Or a group of
> internal/external subsets?
I am advocating for an external entity indentified by public ID as
referenced by DOCTYPE. However some users may need other granularity.
> 2. Note that the two "resolveGrammarLocation" methods at the end seem
> overlapping with "resolveEntity" method of "EntityResolver": they are all
> used to override an location. And in Xerces1, we did use EntityResolver to
> override schema location hints. So it causes confusion when both
> GrammarResolver and EntityResolver are specified: which one to use, or
> which one takes higher priority? To solve this, we can drop the two
> "resolveGrammarLocation" methods, and use the EntityResolver all the time,
> but we are risking losing grammar type and, possibly, namespace
> information. We currently lean more to another solution: derive
> GrammarResolver from EntityResolver. So that you only need to set one
> resolver for such purpose, and we can clearly state which method(s) (the
> one from EntitiResolver or GrammarResolver) takes precedence. Any comments
> on either of the two approaches?
I am for a third approach. Create new interface GrammarResolver that is
similar to EntityResolver but it is used only for interpreting the
schemaLocation attribute. It is not responsibility of EntityResolver
nor GrammarCache.
> 3. There is still a problem for the design of schema caching. Assume
> grammar A (that is, a grammar with target namespace A) is known to the
> parser, then the parser asks the application for grammar B. The application
> returns grammar B, which imports a different grammar A. Now the two A's
> conflict (we assumed a one-grammar-per-namespace rule). To avoid such
> confliction, in the GrammarResolver interface, we ask the application to
> provide a different grammar B in this case (method grammarConflict()).
I assume that cached grammar B just references grammar A. So
parser can use its A copy.
> 4. The parser doesn't interact with a grammar pool directly. It's the job
> of the GrammarResolver to get grammars from the pool, and give them to the
Here I doubt if I understand it. Why parser does not interact
with pool directly? I prefer to call it GrammarCache.
> parser. So the caching detail is separated from the validation process of
> the parser, and is under full control of the application.
>
> 5. Xerces2 will provide a default implementation of GrammarResolver, which
> interacts with a default implementation of grammar pool:
> a. The pool is shared across the application;
> b. The pool is thread safe;
> c. Put every grammar in the pool;
> d. Always try to get grammars from the pool first.
e. pool uses WeakReferences or TimeoutedReferences.
> Is this default behavior satisfying? Of course, people can always
> contribute other general-purpose GrammarResolver/GrammarPool
> implementations.
>
public interface GrammarCache {
// it can be implemented by a Map
public Grammar get(Key);
public Grammar put(Key, Grammar);
public interface Key {
}
}
public interface CacheKeyFactory {
//GrammarCache.Key createKey(String namespace);
//GrammarCache.Key createKey(String pID, String sID);
GrammarCache.Key createKey(String grammarType,
String grammarKey,
String hint);
}
public interface GrammarResolver {
// ask the application to override import
// schemaLocation.
public XMLInputSource resolveGrammarLocation(String grammarType,
String grammarKey,
String hint);
// ask the application to override include/redefine
// schemaLocation.
public XMLInputSource resolveGrammarLocation(String grammarType,
String hint);
}
Time diagram from parser point of view:
$key = ask CacheKeyFactory for a key
$grm = ask GrammarCache for Grammar
if $grm == null {
$in ask GrammarResolver for InputSource
$grm = constructGrammar($in)
put $grm into cache under $key
}
> Did I miss anything?
May be I miss something because I commented so much.
Cc.
--
<address>
<a href="mailto:[EMAIL PROTECTED]">Petr Kuzel</a>, Sun Microsystems
: <a href="http://www.sun.com/forte/ffj/ie/">Forte Tools</a>
: XML and <a href="http://jini.netbeans.org/">Jini</a> modules</address>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]