Re: [Xerces 2] accessing and controling entity parsing in XNI

Aleksander Slominski Sun, 21 Oct 2001 16:38:17 -0700

Andy Clark wrote:

> Currently, we only provide the location via the XMLLocator
> passed to the startDocument/startDTD methods in the handlers.
> Can you use this between callbacks in order to determine the
> boundaries of the markup or content returned?


> Please note however, that the locations reported by the
> locator object are the row and column numbers of the position
> in the *transcoded* stream immediately following the last
> scanned markup or content. So this information does not
> reflect the actual position in the original stream because
> of various issues like character encoding, etc.

i know this that is why i would like XNI XMLLocator to be extended to report
position in the input stream so i can precisely access to parts of markup or
content - it will require to keep per entity position since beginning but it
should be the only change and it should not be difficult?

> > i would like to expose to application fCurrentEntity.position and allow to
> > control peekChar() and load() behavior (load is now private final function
> > ...).
>
> Why do you want to control the entity scanner? Other people
> (e.g. Xalan folks) have also asked about being able to control
> the input buffer in the parser. So it would be useful to know
> why you want this feature.

i am working on the SOAP processor. in this particular case messages are small
and possible will be forwarded to other SOAP processors. but there may be some
modifications such as removing part of markup (ex. SOAP header). the best way to
do it is to keep buffering input stream, notice all skipped markup and then
forward XML content with slight modification like skipping or inserting some
content (if necessary).

this can only be done if parsing layer will give me this precise positioning
information...

in Xml Pull Parser 2 i have now X2 driver that uses Xerces2 XNI pull parsing API
as an alternative to my default tokenizer/parser implementation and this
positioning is the only misising feature that prevents me from using Xerces2 for
efficient SOAP pull parsing :-)

> > finally i would like to be able ot pinpoint input buffer so it is always
> > growing but never shrunk with System.arraycopy() - it is very useful if i
> > want to keep in memory representation of unparsed XML in memory that can
> > be used similarly to DOM as persistent representation of XML doc  ( to
> > reconstruct DOM *when* it is needed...).
>
> This is a much more difficult request and I'll explain why.
>
> The scanner is implemented to be as efficient as possible.
> So it re-uses the underlying character buffer over and over
> again. We've been asked to add a feature to orphan the
> character array instead of re-using it so that people can
> keep a reference to the character array passed to the
> characters() method and know that the data won't be
> changed later. This could very easily be done.
>
> However, growing the underlying character array is much
> more difficult. Do you want the array contain the decoded
> but non-normalized contents of the document? Or do you
> want the array to contain the "flattened" contents of
> the document, with all entities inlined, etc? And once
> you grow the array, then all of the array references
> and position information that you've collected during
> the parse is incorrect.
>
> So I would advise not to go down that path.

instead of pinpointing i can pass my own reader that will keep content of
incoming input in a growable buffer. however i still will need to install some
kind of entity manager so i can wrap all entity inputs with my own buffering
reader (though it is not that critical for as SOAP spec disallows DTD
declaration so there is no external entities to worry about ...).

how difficult would it be to do? i would be happy to do it (all actually looks
not that complex) but i have no experience with X2 codebase...

thanks,

alek



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Xerces 2] accessing and controling entity parsing in XNI

Reply via email to