This list is eerily quiet these days in regards to Xerces2
design and development. Hmmm... Well, I'm going to raise a
few more design points and then make some unilateral
decisions unless there's some discussion or objections.

[1] DTD Handler Interfaces

We've had some discussion but no resolution on what is to
become of the XMLDTDHandler and XMLDTDContentModelHandler
interfaces. Obviously the right balance of information and
usefulness is beyond our reach.

I'm now leaning towards Glenn's earlier suggestion that we
can provide DTD information needed to DTD editor writers
(arguably a very small percentage of the parser user base)
via the SAX xml-string property. I'm not suggesting that
we *will* implement this before rolling out Xerces2, merely
that we *can* support communicating more information in
the future through this (or a similar) mechanism.

I'm still torn between 1 and 2 interfaces so I'm gonna
stay with 2 separate interfaces. However, I would make
the following changes to the XMLDTDContentModelHandler
interface, in an attempt to provide a better callback
interleaving with start/endEntity.

  public interface XMLDTDContentModelHandler {

    public static final short OCCURS_ZERO_OR_MORE = 0;
    public static final short OCCURS_ZERO_OR_ONE = 1;
    public static final short OCCURS_ONE_OR_MORE = 2;

    public static final short SEPARATOR_CHOICE = 3;
    public static final short SEPARATOR_SEQUENCE = 4;

    public void startContentModel(String elementName) // *
      throws XNIException;

    public void any() throws XNIException; // +
    public void empty() throws XNIException; // +

    public void startGroup() throws XNIException; // rename
    public void pcdata() throws XNIException; // +
    public void element(String name) throws XNIException; // rename
    public void occurs(short occurs) throws XNIException; // rename
    public void separator(short separator) throws XNIException; //
rename
    public void endGroup() throws XNIException; // rename

    public void endContentModel() throws XNIException;

  } // interface XMLDTDContentModelHandler

[2] Pass Base URI to startEntity Callback

In the continuing effort to pass as much information via
XNI as possible, I would suggest passing the base systemId
when calling the startEntity method in all of the handlers
that define this method.

Therefore, the method would have the following prototype:

  public void startEntity(String name,
                          String publicId, String systemId,
                          String baseSystemId, // +
                          String detectedEncoding)
    throws XNIException;

For convenience, would it be useful to also pass the
expanded systemId as well?

[3] More Parser Interfaces

One of the problems I've run into (and I've mentioned
this before but I'll describe it again) is that while
we have a way to construct parser pipelines, we don't
have a way of actually initiating the pipeline in a
generic fashion.

In short, as Ted suggested, it would be cool if we
could define a pipeline dynamically. We can do most of
this today *except* for actually telling the scanner to
start scanning the input. Without this we can't swap
scanners arbitrarilly. And I want/need to be able to
do this! I don't want to have to rewrite an entire
parser configuration just to change from the fully
conformant scanner to a stripped-down "lite" scanner.

Which means that I need to define a new interface for
the document and DTD scanners. So here's a thought:

  public interface XMLDocumentScanner {
    public void startDocument(InputSource source)
      throws IOException;
    public boolean scanDocument(boolean complete)
      throws XNIException, IOException;
  }

  public interface XMLDTDScanner {

    public void startInternalSubset(InputSource source)
      throws IOException;
    public boolean scanInternalSubset(boolean complete,
                                      boolean standalone,
                                      boolean hasExternalDTD)
      throws XNIException, IOException;

    public void startExternalSubset(InputSource source)
      throws IOException;
    public boolean scanExternalSubset(boolean complete)
      throws XNIException, IOException;

  } // interface XMLDTDScanner

Right now the reference implementation of the scanners
depends on having an XMLEntityManager to handle the
entities and accessing an entity scanner capable of
tokenizing the lowest level input. So there is no
need to pass the input source to the scanners because
they get them from the entity manager. Of course,
making this kind of change impacts the scanners
directly. And I still don't have my head around how
this changes things.

[4] Remove Dependence on SAX

I kept the most controversial 'til last so that I only
get feedback from the real hardcore contributors. Or
from people who have nothing better to do than to read
really long posts on the mailing list... :)

Seriously, a point was raised at the Xerces2 Workshop: why 
*do* we use the SAX stuff but only where appropriate? Why 
not either extend SAX or just remove all dependence on SAX 
and make a completely standalone set of interfaces? 
Extending SAX raises a lot of problems that I won't go 
into here but let's suffice it to say that it's not the
best option.

Since the Workshop I've been starting to agree with the 
argument and want to remove dependence on all other APIs 
from XNI. Therefore, I would suggest that we remove all
use of SAX throughout XNI. That means that we would
invent our own entity resolver, input source, etc. We 
can make them look an awful lot like SAX so that it 
bridges the learning gap between the two, though.

Diverging from SAX also lets us fix the problems that
SAX has. For example, there was a recent suggestion on 
the xml-dev mailing list about extending SAX in the
future so that the entity resolver is passed the base
URI as well to allow the resolver to do more. I'm all
for this parameter. And I'm sure that there are more
instances like this.

So here's a proposal (in an abbreviated form):

  public class XMLInputSource {

    protected String fPublicId;
    protected String fSystemId;
    protected String fBaseSystemId; // +
    protected String fExpandedSystemId; // +

    protected String fEncoding;

    protected InputStream fByteStream;
    protected Reader fCharStream;

  } // class XMLInputSource

  public class XNIParseException extends XNIException {
    protected XMLLocator fLocation;
  }

  public interface XMLLocator {

    public String getPublicId();
    public String getSystemId();
    public String getBaseSystemId(); // +
    public String getExpandedSystemId(); // +

    public int getLineNumber();
    public int getColumnNumber();

  } // interface XMLLocator

  public interface XMLErrorHandler {

    public void warning(XNIParseException ex) throws XNIException;
    public void error(XNIParseException ex) throws XNIException;
    public void fatalError(XNIParseException ex) throws XNIException;

  } // interface XMLErrorHandler

  public interface XMLEntityResolver {
    public XMLInputSource resolveEntity(String publicId, 
                                        String systemId,
                                        String baseSystemId) // +
      throws XNIException, IOException;
  }

Am I missing anything?

[*] Start the Discussion!

Here's where you add your 2 cents (or yen, etc). I'll keep
this topic open throughout this week and then, barring any
problems, I'll start enacting the changes next week.

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to