Re: [Xerces2] Design Decisions (LONG)

Ted Leung Fri, 06 Jul 2001 20:49:28 -0700

----- Original Message -----
From: "Andy Clark" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, July 02, 2001 11:42 PM
Subject: [Xerces2] Design Decisions (LONG)


> This list is eerily quiet these days in regards to Xerces2
> design and development. Hmmm... Well, I'm going to raise a
> few more design points and then make some unilateral
> decisions unless there's some discussion or objections.
>
> [1] DTD Handler Interfaces
>
> We've had some discussion but no resolution on what is to
> become of the XMLDTDHandler and XMLDTDContentModelHandler
> interfaces. Obviously the right balance of information and
> usefulness is beyond our reach.
>
> I'm now leaning towards Glenn's earlier suggestion that we
> can provide DTD information needed to DTD editor writers
> (arguably a very small percentage of the parser user base)
> via the SAX xml-string property. I'm not suggesting that
> we *will* implement this before rolling out Xerces2, merely
> that we *can* support communicating more information in
> the future through this (or a similar) mechanism.
>
> I'm still torn between 1 and 2 interfaces so I'm gonna
> stay with 2 separate interfaces. However, I would make
> the following changes to the XMLDTDContentModelHandler
> interface, in an attempt to provide a better callback
> interleaving with start/endEntity.
>
>   public interface XMLDTDContentModelHandler {
>
>     public static final short OCCURS_ZERO_OR_MORE = 0;
>     public static final short OCCURS_ZERO_OR_ONE = 1;
>     public static final short OCCURS_ONE_OR_MORE = 2;
>
>     public static final short SEPARATOR_CHOICE = 3;
>     public static final short SEPARATOR_SEQUENCE = 4;
>
>     public void startContentModel(String elementName) // *
>       throws XNIException;
>
>     public void any() throws XNIException; // +
>     public void empty() throws XNIException; // +
>
>     public void startGroup() throws XNIException; // rename
>     public void pcdata() throws XNIException; // +
>     public void element(String name) throws XNIException; // rename
>     public void occurs(short occurs) throws XNIException; // rename
>     public void separator(short separator) throws XNIException; //
> rename
>     public void endGroup() throws XNIException; // rename
>
>     public void endContentModel() throws XNIException;
>
>   } // interface XMLDTDContentModelHandler


-0 I still think that editor writing is out of the scope of the parser, but
I
may be the only one who thinks this.

> [2] Pass Base URI to startEntity Callback
>
> In the continuing effort to pass as much information via
> XNI as possible, I would suggest passing the base systemId
> when calling the startEntity method in all of the handlers
> that define this method.
>
> Therefore, the method would have the following prototype:
>
>   public void startEntity(String name,
>                           String publicId, String systemId,
>                           String baseSystemId, // +
>                           String detectedEncoding)
>     throws XNIException;
>
> For convenience, would it be useful to also pass the
> expanded systemId as well

We should only pass the expanded systemId if there's no way to construct
it from the rest of the data -- which there is in this case.

> [3] More Parser Interfaces
>
> One of the problems I've run into (and I've mentioned
> this before but I'll describe it again) is that while
> we have a way to construct parser pipelines, we don't
> have a way of actually initiating the pipeline in a
> generic fashion.
>
> In short, as Ted suggested, it would be cool if we
> could define a pipeline dynamically. We can do most of
> this today *except* for actually telling the scanner to
> start scanning the input. Without this we can't swap
> scanners arbitrarilly. And I want/need to be able to
> do this! I don't want to have to rewrite an entire
> parser configuration just to change from the fully
> conformant scanner to a stripped-down "lite" scanner.
>
> Which means that I need to define a new interface for
> the document and DTD scanners. So here's a thought:
>
>   public interface XMLDocumentScanner {
>     public void startDocument(InputSource source)
>       throws IOException;
>     public boolean scanDocument(boolean complete)
>       throws XNIException, IOException;
>   }
>
>   public interface XMLDTDScanner {
>
>     public void startInternalSubset(InputSource source)
>       throws IOException;
>     public boolean scanInternalSubset(boolean complete,
>                                       boolean standalone,
>                                       boolean hasExternalDTD)
>       throws XNIException, IOException;
>
>     public void startExternalSubset(InputSource source)
>       throws IOException;
>     public boolean scanExternalSubset(boolean complete)
>       throws XNIException, IOException;
>
>   } // interface XMLDTDScanner
>
> Right now the reference implementation of the scanners
> depends on having an XMLEntityManager to handle the
> entities and accessing an entity scanner capable of
> tokenizing the lowest level input. So there is no
> need to pass the input source to the scanners because
> they get them from the entity manager. Of course,
> making this kind of change impacts the scanners
> directly. And I still don't have my head around how
> this changes things.

Can we break this out as a separate discussion.  I'd really like to
see this work, and I think I have some time to put towards making
this work.

> [4] Remove Dependence on SAX
>
> I kept the most controversial 'til last so that I only
> get feedback from the real hardcore contributors. Or
> from people who have nothing better to do than to read
> really long posts on the mailing list... :)
>
> Seriously, a point was raised at the Xerces2 Workshop: why
> *do* we use the SAX stuff but only where appropriate? Why
> not either extend SAX or just remove all dependence on SAX
> and make a completely standalone set of interfaces?
> Extending SAX raises a lot of problems that I won't go
> into here but let's suffice it to say that it's not the
> best option.
>
> Since the Workshop I've been starting to agree with the
> argument and want to remove dependence on all other APIs
> from XNI. Therefore, I would suggest that we remove all
> use of SAX throughout XNI. That means that we would
> invent our own entity resolver, input source, etc. We
> can make them look an awful lot like SAX so that it
> bridges the learning gap between the two, though.
>
> Diverging from SAX also lets us fix the problems that
> SAX has. For example, there was a recent suggestion on
> the xml-dev mailing list about extending SAX in the
> future so that the entity resolver is passed the base
> URI as well to allow the resolver to do more. I'm all
> for this parameter. And I'm sure that there are more
> instances like this.
>
> So here's a proposal (in an abbreviated form):
>
>   public class XMLInputSource {
>
>     protected String fPublicId;
>     protected String fSystemId;
>     protected String fBaseSystemId; // +
>     protected String fExpandedSystemId; // +
>
>     protected String fEncoding;
>
>     protected InputStream fByteStream;
>     protected Reader fCharStream;
>
>   } // class XMLInputSource
>
>   public class XNIParseException extends XNIException {
>     protected XMLLocator fLocation;
>   }
>
>   public interface XMLLocator {
>
>     public String getPublicId();
>     public String getSystemId();
>     public String getBaseSystemId(); // +
>     public String getExpandedSystemId(); // +
>
>     public int getLineNumber();
>     public int getColumnNumber();
>
>   } // interface XMLLocator
>
>   public interface XMLErrorHandler {
>
>     public void warning(XNIParseException ex) throws XNIException;
>     public void error(XNIParseException ex) throws XNIException;
>     public void fatalError(XNIParseException ex) throws XNIException;
>
>   } // interface XMLErrorHandler
>
>   public interface XMLEntityResolver {
>     public XMLInputSource resolveEntity(String publicId,
>                                         String systemId,
>                                         String baseSystemId) // +
>       throws XNIException, IOException;
>   }
>
> Am I missing anything?

+0.  Cleanliness on this is okay with me.

> [*] Start the Discussion!
>
> Here's where you add your 2 cents (or yen, etc). I'll keep
> this topic open throughout this week and then, barring any
> problems, I'll start enacting the changes next week.
>
> --
> Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Xerces2] Design Decisions (LONG)

Reply via email to