The getSrcOffset() method of XMLScanner should return you the information
you want. However, it can only do that if the source offset stuff is
supported by the transcoding system being used. For ICU and the internal
transcoders that is true. I just looked and in the latest repository files,
the Win32 and ICU transcoders are supporting this functionality.

So if you get the scanner, and call getSrcOffset() it should return you the
position where it stopped transcoding the element it just passed to you.
This should be in terms of the raw content buffer it is parsing from, i.e.
pre-transcoded input. If its not returning the correct info, then perhaps it
has become broken over time since hardly anyone every uses it. But it used
to work because we had to make it so for an internal IBM customer at the
time.

--------------------------
Dean Roddey
The Charmed Quark Controller
Charmed Quark Software
[EMAIL PROTECTED]
http://www.charmedquark.com

"If it don't have a control port, don't buy it!"


----- Original Message -----
From: "Jason E. Stewart" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Tuesday, April 23, 2002 7:31 PM
Subject: Re: how to access the raw text that generated a sax event


> "Jason E. Stewart" <[EMAIL PROTECTED]> writes:
>
> > Any ideas what to do?
>
> I finally broke down and read the source code for XMLScanner and
> XMLReader and I'm convinced that without a major re-writing, this is
> not possible.
>
> Basically, the XMLReader calls readBytes() on the stream to fill up a
> buffer - so curPos() could never help us as the stream is read in
> chunks of the buffer size.
>
> Then, the XMLReader maintains two internal buffers: one of raw bytes
> and the other of transcoded characters. When the transcoded buffer
> starts running low, it transcodes another bufferful from the raw
> buffer and all the information about how many characters have been
> read so far is thrown away.
>
> Also when the raw buffer is running low it reads in more data from the
> stream, and it too throws away all the information about how many
> bytes have been processed so far.
>
> It would be possible to save this information when the buffers are
> refilled, but it *still* wouldn't give us the info that we
> want. Because the XMLScanner gets all it's data from the transcoded
> character buffer, at best we could hope to find out at what
> *character* position we are at in the file. But we can only be sure of
> the character <-> bytes mapping for fixed width characters, which
> won't help us for UTF16 or UTF8, but I guess it would work for
> ISO-8859-1 and ASCII.
>
> So I'm out of ideas.
> jas.
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to