[EMAIL PROTECTED] wrote: > > Internally, we validate by numbers. If the validation was done externally > via public API, then it would have to be done by strings. This would be > significantly slower. For instance, a DTD content model is reduced to a an > array of arrays of numbers when its in DFA form. Internally, those are the > actual numbers that the scanner stores and passes to the validator to > validate. If it were done on top of the system, via public APIs, everything > would have to work in terms of strings that would have to be re-tokenized > in some way.
I don't want to be a pain but I want to understand. The parser sees "P". It interns it as "7" because "P" is 7 in this invocation of the parser. It passes the number 7 to the validator. Why couldn't the validator have figured out that "P" is 7? Is this the expensive operation that we are avoiding? Is that what you mean by re-tokenization? Even if we allow that it is too expensive for the validator to do the lookup for itself, could there be a relatively thin public API on top of the validator that could map strings to integers for people that happen to not be using Xerces directly? In other words it would maintain and hide the whole string pool thing. Xeres could continue to use the private integer-based API and share the string pool. > Or, you have some sort of DOMInputSource that streams > the data from a DOM tree back into the parser to be validated. The parser > has an abstraction for where the data comes from, so it can be made to come > from pretty much anything you want to store it in. But is there a parser input abstraction at the same logical level as SAX DocumentHandler or Xalan FormatListener? > And we are very much making use of abstractions. The validator abstraction > is just such a thing. Both DTD and Schema validators will work via this > same abstraction and the scanner does not have to know what it really is > dealing with. I agree that that is a cool way of doing it. It just happens to solve a different problem than the one I need to solve. I need to plug in the validator at random places in my process chain, ideally without re-parsing. -- Paul Prescod - ISOGEN Consulting Engineer speaking for himself "Remember, Ginger Rogers did everything that Fred Astaire did, but she did it backwards and in high heels." --Faith Whittlesey