[EMAIL PROTECTED] wrote:
> 
> Internally, we validate by numbers. If the validation was done externally
> via public API, then it would have to be done by strings. This would be
> significantly slower. For instance, a DTD content model is reduced to a an
> array of arrays of numbers when its in DFA form. Internally, those are the
> actual numbers that the scanner stores and passes to the validator to
> validate. If it were done on top of the system, via public APIs, everything
> would have to work in terms of strings that would have to be re-tokenized
> in some way.

I don't want to be a pain but I want to understand. The parser sees "P".
It interns it as "7" because "P" is 7 in this invocation of the parser.
It passes the number 7 to the validator. Why couldn't the validator have
figured out that "P" is 7? Is this the expensive operation that we are
avoiding? Is that what you mean by re-tokenization?

Even if we allow that it is too expensive for the validator to do the
lookup for itself, could there be a relatively thin public API on top of
the validator that could map strings to integers for people that happen
to not be using Xerces directly? In other words it would maintain and
hide the whole string pool thing. Xeres could continue to use the
private integer-based API and share the string pool.

> Or, you have some sort of DOMInputSource that streams
> the data from a DOM tree back into the parser to be validated. The parser
> has an abstraction for where the data comes from, so it can be made to come
> from pretty much anything you want to store it in.

But is there a parser input abstraction at the same logical level as SAX
DocumentHandler or Xalan FormatListener?

> And we are very much making use of abstractions. The validator abstraction
> is just such a thing. Both DTD and Schema validators will work via this
> same abstraction and the scanner does not have to know what it really is
> dealing with.

I agree that that is a cool way of doing it. It just happens to solve a
different problem than the one I need to solve. I need to plug in the
validator at random places in my process chain, ideally without
re-parsing.

-- 
 Paul Prescod  - ISOGEN Consulting Engineer speaking for himself
"Remember, Ginger Rogers did everything that Fred Astaire did,
but she did it backwards and in high heels."
                                               --Faith Whittlesey

Reply via email to