>I don't want to be a pain but I want to understand. The parser sees "P".
>It interns it as "7" because "P" is 7 in this invocation of the parser.
>It passes the number 7 to the validator. Why couldn't the validator have
>figured out that "P" is 7? Is this the expensive operation that we are
>avoiding? Is that what you mean by re-tokenization?

Actually the validator *is* the thing at assigns unique ids to elements,
attributes, notations, etc... The deal is though that, as long as we are
still down inside the scanner, the scanner is still dealing with the
elements via their ids, and so can give the valdiator a list of children in
a directly digestable format.

>
>Even if we allow that it is too expensive for the validator to do the
>lookup for itself, could there be a relatively thin public API on top of
>the validator that could map strings to integers for people that happen
>to not be using Xerces directly? In other words it would maintain and
>hide the whole string pool thing. Xeres could continue to use the
>private integer-based API and share the string pool.

You can do this too, its just that we don't make use of it. You can ask the
validator for the id of a particular element or attribute. I was just
saying earlier that we choose not to do it this way because its twice the
tokenization, twice the tokenization. But you can certainly do this if you
choose to.

In fact, if you look at the base validator abstraction, XMLValidator, there
is a checkContent() method which takes the parent element id and the list
of child element ids to validate. It would be pretty trivial for me to add
a non-virtual version of that, which takes the name of a parent element,
and the names of the child elements, looks them up internally in the pools,
builds the list of ids, and call the virtual version which will vector off
to the actual DTD or Schema implementation.

I think there used to be such an API, but it got lost in the transition to
the new validator architecture because no one seemed to be using it. But it
could be added back with little effort. It is somewhat complicated by the
fact that you have to provide either prefix:foo or {uri}foo and tell the
method which you intend them to be (the old version was pre-namespace so it
never dealt with this.) I would assume it would just be two part names, and
the first part is either a URI or a prefix, and you pass a boolean that
indicates which one it is.


>But is there a parser input abstraction at the same logical level as SAX
>DocumentHandler or Xalan FormatListener?

Not sure I understand that question exactly. Whether we have some
particular implementation of something you might want is always
questionable. But, no matter what you want, if it can be used to regenerate
a stream of legal XML, it can be gotten back into the parser.


>I agree that that is a cool way of doing it. It just happens to solve a
>different problem than the one I need to solve. I need to plug in the
>validator at random places in my process chain, ideally without
>re-parsing.
>

As long as you are willing to be parser specific you can do this. There are
no public APIs to do any of that type of stuff generically. If you are
passing a DOM down the chain, then you'd have to just get the parent
element name and child element names, and pass them tot he validator to be
checked. Until I can add the helper method to do the retokenization for
you, you'd have to do this yourself, but it would be easy to have a generic
method that would do this for you to preflight the data for the call to the
validator.


---------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Reply via email to