Re: how to access the raw text that generated a sax event

Jason E. Stewart Tue, 23 Apr 2002 10:16:25 -0700

"Murphy, James" <[EMAIL PROTECTED]> writes:

> I thought this would be really handy when parsing from a continuous buffer
> like a MemBufInputSource or a LocalFileInputSource.  I have a situation
> where I SAX parse _very_ large XML instances looking for small repeating
> fragments.  These fragments are operated on individually by making a DOM to
> operating on those nodes in all sorts of application defined ways.  
> 
> If I had the functionality described by Ted, I could SAX the file and save
> off the starting and ending offsets into the large document.  Post that info
> to a thread pool to process the fragments asynchronously.  In fact, I can
> use my Win32 memory mapped file input source to SAX the original large file
> and serve as a source to the DOM parser during the per work item processing.
> The way I'm doing it now involved _way_ too many buffer copies to be really
> fast - but it could be.


Hey Jim,

I agree. For the MAGE object model we're going to be routinely parsing
big chunks of scientific data, maybe 0.5 Gb => 2.0 Gb, and looking for
certain pieces of the data. I'd like to be able to do lazy parsing,
and just store the byte offsets to the bits that I want. 

There *has* to be some easy modification that we can make to subclass
InputSource or XMLScanner to get this working. I don't know enough
about the internals of how the scanner works, but if someone can clue
me in a bit, I'd be happy to implement this.

jas.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: how to access the raw text that generated a sax event

Reply via email to