Andy Clark wrote:
> Aleksander Slominski wrote:
> > content - it will require to keep per entity position since
> > beginning but it should be the only change and it should not
> > be difficult?
>
> It *seems* easy but it's not.
>
> The only reliable way of doing this is to write custom
> readers for every conceivable character encoding so that
> you can keep track of byte vs. char location in the XML
> document stream.
hi,
i was actually thinking about keeping position in UTF16 input reader ie.
fCurrentEntity.reader and just
exposing fCurrentEntity.position and that i think is much much easier....
i agree completely that trying to do it with keeping position in original input stream
(with all
possible encodings) would require a lot of work and could even prevent efficient
buffering....
so in nutshell i would like simple solution could be adding to XMLLocator one method:
/** Returns the parser position counting size beginning of entity input. */
public int getCurrentEntityAbsoluteOffset();
and implementing it as part of XMLEntityManager so more precise positioning is
available. the effect on
parser performance is absolutely minimal - just one add operation in load(...).
> > positioning is the only misising feature that prevents me from
> > using Xerces2 for efficient SOAP pull parsing :-)
>
> I'm having trouble buying that conclusion. :)
i am just saying that SOAP processing is a special kind of XML parsing and sometimes
requires special
features but it would be good to leverage existing well tested and implemented
infrastructure even in
those special circumstance ....
> > instead of pinpointing i can pass my own reader that will keep
> > content of incoming input in a growable buffer. however i still
>
> Unless your reader only returns one char at a time, this
> is not going to work because the parser reads the input in
> chunks. Therefore, the location your reader reports will
> be past the actual point where the scanner is looking at
> markup. And even if your reader limited chunking calls to
> a single char at a time, this is grossly inefficient.
i agree. i would use only reader to actually preserve input and not to do positioning
(and also maybe
to prevent Xerces from closing my input stream ...).
> If you're relying on this provide the performance you
> need, then I would suggest attacking the performance from
> another angle.
i am interested in doing efficient dispatching/routing and that requires extracting
from XML partial
information but still as it is XML i need XML parser. so the idea is simply to do pull
parsing as much
content as needed and then dispatch the rest of it.
thanks,
alek
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]