Re: [Xerces2] Design Decisions (LONG)

Andy Clark Wed, 04 Jul 2001 19:58:57 -0700
Elena Litani wrote:
> Hmmm.. We do plan to implement DOM L3 specs and although we did not
> discuss implementing Abstract Schema [AS] module (that allows editing
> and querying grammars), the APIs are there. Do we want to provide
> another way to access grammar??

Are you confident that the DOM L3 ASLS is able to adequately
model a grammar? I'm not. I'm sure it's sufficient (for some
definition of "sufficient") for DTDs and XML Schema grammars
but I'm not convinced it will be able to model other types
of schema grammar languages.

> I am not sure how DTDContentModelHandler is useful for editor writers.

It's not supposed to be. And creating one that is useful
causes the exponential interface explosion that we've seen
from my various proposals. I'm not against providing some-
thing useful for editor writers but that is not the primary
purpose of the interface nor should it be.

> In the world of XML Schemas, you better also provide support for editing
> XML Schemas as well as DTDs, and DTDContentModelHandler provides way to
> little information..

That's why "DTD" is in the name of interface: because it's
for DTDs, *not* XML Schemas.

> My point is, I believe if we ever want to provide in Xerces APIs for
> editing/querying AS (DTD, XML Schemas, other grammars), we better do it
> by implementing standard DOM APIs.

We "could" do it that way. Whether we "better" do it that
way is open to debate.

> > [2] Pass Base URI to startEntity Callback
> > [...]
> 
> Can baseSystemId be null?

Yes, that would signify that there is no known base system
identifier. However, I'm still trying to think of a situation
where the systemId is *relative* but no base systemId is
provided. Can anybody think of a use case?

The point of passing this information is so that we don't
munge data. The systemId parameter would be the exact
identifier text that was specified in the declaration.
In this way, the callee has access to all of the
information. Without it we have two options: pass in only
the systemId or pass in the expanded systemId. In the
first case the callee doesn't have enough information to
do something meaningful and in the second case the callee
is not actually seeing what was defined in the decl.

Yes, that information would be passed via the DTD handler
(add that to my list of things that need to pass the base
systemId) but then you're creating a dependency that you
need to use both interfaces and keep state. The parser's
already keeping the state, why not pass it along if we
have already done the work, IMHO.

> > [4] Remove Dependence on SAX
> 
> If we are removing dependence on SAX, should we consider using DOM L3
> APIs for error handling, entity resolver, input source?

You're proposing that we remove one dependency to add
another similar dependency. I'm trying to remove the
dependency altogether, not just switch it to one that
looks "better", even if that were the case.

> Andy, you've mentioned that SAX raises a lot of problems, could you
> expand on that? ..I am not sure why do we need to define again the same
> interfaces that exist in both DOM and SAX??

SAX and DOM are lossy. We're trying to avoid loss of
document information by passing as much as possible
(within reason). The SAX approach is better suited to
building a parser because DOM cannot be used once the
document size increases beyond size N, where N is
some arbitrary large number. But even SAX can't be
used directly because 1) it loses needed information
in its handler interfaces; 2) the pipeline it creates
is read-only (e.g. Attributes); etc.

My proposal actually is more radical than what it 
first appears. I'm also suggesting that we take this 
opportunity to break backwards compatibility in order
to do the right thing. Take, for example, parsing: the 
base XML parser class relies on SAX for setting 
features/properties, entity resolution, error handling, 
and parsing. Why would a DOM parser need such a 
dependency? No real reason.

So what is the "right" thing? Depends on who you
ask. If you ask the SAX people, they might say that you
should use the parser factory helper classes to create 
the parser and then using the parser and handler
interfaces defined. The DOM people might say that the 
right way is through the load/save component of DOM L3. 
Or they both might say using JAXP to create the parser 
and parsing documents. In none of these cases would I 
consider instantiating the parser class directly "the 
right thing to do".

By defining our own fully-independent, internal API we 
remove all dependencies (because SAX and DOM are not the 
*only* way to output the document data from parsing an 
XML document) and improve the ability to layer parser 
components and configurations without carrying along 
unneeded dependencies. People using the SAX parser 
would do things the "SAX way"; people using the DOM 
parser would do things the "DOM way"; and none of them 
would have to do some things one way and do other 
things other ways just because we have built in a 
cross-dependency.

Only people going *beyond* what is capable with the
"standard" interfaces and taking advantage of the
native APIs would ever know that there is a set of
interfaces that look awfully similar to ones already
present in SAX and DOM. Once the wrapper is built on
top of the XNI stuff to expose it in the SAX way or
the DOM way, people won't even know it's there.

Can you tell I'm starting to feel pretty strongly about
this point? ;)

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] Design Decisions (LONG)

Reply via email to