Re: [Xerces 2] accessing and controling entity parsing in XNI

Aleksander Slominski Mon, 22 Oct 2001 13:17:16 -0700

Andy Clark wrote:

> Aleksander Slominski wrote:
> > content - it will require to keep per entity position since
> > beginning but it should be the only change and it should not
> > be difficult?
>
> It *seems* easy but it's not.
>
> The only reliable way of doing this is to write custom
> readers for every conceivable character encoding so that
> you can keep track of byte vs. char location in the XML
> document stream.


hi,

i was actually thinking about keeping position in UTF16 input reader ie. 
fCurrentEntity.reader and just
exposing fCurrentEntity.position and that i think is much much easier....

i agree completely that trying to do it with keeping position in original input stream 
(with all
possible encodings) would require a lot of work and could even prevent efficient 
buffering....

so in nutshell  i would like simple solution could be adding to XMLLocator one method:

    /** Returns the parser position counting size beginning of entity input. */
    public int getCurrentEntityAbsoluteOffset();

and implementing it as part of XMLEntityManager so more precise positioning is 
available. the effect on
parser performance is absolutely minimal - just one add operation in load(...).


> > positioning is the only misising feature that prevents me from
> > using Xerces2 for efficient SOAP pull parsing :-)
>
> I'm having trouble buying that conclusion. :)

i am just saying that SOAP processing is a special kind of XML parsing and sometimes 
requires special
features but it would be good to leverage existing well tested and implemented 
infrastructure even in
those special circumstance ....

> > instead of pinpointing i can pass my own reader that will keep
> > content of incoming input in a growable buffer. however i still
>
> Unless your reader only returns one char at a time, this
> is not going to work because the parser reads the input in
> chunks. Therefore, the location your reader reports will
> be past the actual point where the scanner is looking at
> markup. And even if your reader limited chunking calls to
> a single char at a time, this is grossly inefficient.

i agree. i would use only reader to actually preserve input and not to do positioning 
(and also maybe
to prevent Xerces from closing my input stream ...).

> If you're relying on this provide the performance you
> need, then I would suggest attacking the performance from
> another angle.

i am interested in doing efficient dispatching/routing and that requires extracting 
from XML partial
information but still as it is XML i need XML parser. so the idea is simply to do pull 
parsing as much
content as needed and then dispatch the rest of it.

thanks,

alek



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [Xerces 2] accessing and controling entity parsing in XNI

Reply via email to