Re: grammar caching requirements

neilg Thu, 07 Feb 2002 08:19:17 -0800

Hi Neeraj,

Two thoughts:

1.  Given that we already have the EntityResolver interface, which is
optimized for DTD's, I think having some kind of externalDTDLocation
property would just muddy the issue.  Besides, complex questions viz. spec
compliance would get raised--what if a document's internal subset contains
an entity decl that overrides an entity in one of the externally-defined
DTD's?  Do we somehow recompute our grammar?  Do we ignore it but still
send entity-declaration events down the pipeline, misleading the
application into thinking we're actually going to use these entities?  Do
we ignore the whole internal subset--then what events *do* we send to any
DTDHandlers that might be lurking around?  These are tough questions and I
doubt we'll find good answers for everyone; so let's not make it seem easy
with some simple-sounding override mechanism.

2.  "seeding" the XMLGrammarPool with grammars:  Since we're planning to
permit preparsing on this configuration, I don't really see the need.
Unless you're thinking the application might just serialize  the
grammars--in which case why not just serialize the GrammarPool?  It would
be nice if our grammars were serializable though; I just didn't think we'd
necessarily have time for this in the next little while.  Something to come
back to perhaps.

So what piece of the coding do you want to help out with Neeraj?  :-)

Cheers,
Neil

Neil Graham
XML Parser Development
IBM Toronto Lab
Phone:  905-413-3519, T/L 969-3519
E-mail:  [EMAIL PROTECTED]

Neeraj Bajaj <[EMAIL PROTECTED]> on 02/07/2002 10:37:35 AM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Re: grammar caching requirements

> Andy Clark wrote:-
> So I think we *should* keep this information linked: grammars
> and grammar descriptions. Perhaps something like the following:
>
>   public interface Grammar {
>     public XMLGrammarDescription getGrammarDescription();
>   }
>
>   public interface XMLGrammarDescription
>     extends XMLResourceIdentifier {
>     public String getGrammarType();
>   }
>
> The XMLGrammarPool interface would stay the same.

>I think that it would be simpler in the end to keep the grammar
>description information with the grammar.

> If a grammar doesn't keep a copy of its associated grammar
> description then each grammar pool instance needs to have
> all of the logic to determine if a requested grammar
> description properly identifies a registered grammar.
>
> Does this make sense?

it makes sense :) it is more usable and extensible than a grammar object
keeping
information about its type only. I feel it keeps our options open to add
more
things that we may find are required to identify (other) grammars in
future.

           We need constants for different kind of grammars which help
validaton
components and grammar pool to identify what kind of grammars they are
interacting. we should shift the two constants defined for each grammar
type to
XMLGrammarDescription.

public interface XMLGrammarDescription
     extends XMLResourceIdentifier {

    public static final String XML_SCHEMA = "XSD";
    public static final String XML_DTD = "DTD";

     public String getGrammarType();
   }

> > 3.  It should encompass both XML Schemas and DTD's;
>
> And more...
>
> > 4.  It should permit grammars to be preparsed or cached as they are
> >     encountered while validating instance documents;
> > 5.  It should permit the application to "lock" the cache, that is,
> >     prevent any more grammars from being added.
>
> And we need to be able to allow a DTD grammar to a) be
> used in the case where the document contains no DOCTYPE
> line.

I am of the opinion that we should keep these things away from this
configuration. I see this configuration as responsible for caching and
making
the grammar available to parser before hand, though it can always be
extended
if required.
                     Moreover, i think this is a feature which is not
limited to this
configuration and we should provide this by some other means something like

external-schema-location we have for schema grammar.

> b) override the grammar specified in the DOCTYPE declaration.

If grammar with the same root element is already available to the parser,
as a
part of initial grammar set provided by the application before validation
begins. Will parser still ask application for the grammar ? IMO, it should
not.
I consider this as something which application knowingly and intentionally
telling to the parser to validate instances with. If its not available with
the
parser then it should give chance to application to provide it from its
cache.
But i think it definitely requires more thought. I feel same would hold
true for
schema ???

           I think many applications would like to preparse all the
grammars and
provide it as as initial set to the parser before parsing begins.
Applications
would like to provide their own implementation by setting the grammar pool
property. We can make it easy for the application by introducing a method
in
XMLGrammarPoolImpl, making it more reusable, which allows set of grammars
to be
incorporated in its data structures thus making it available to the parser
as
part of initial grammar set.

Regards,

Neeraj B.
Sun Microsystems, inc.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: grammar caching requirements

Reply via email to