"Murphy, James" <[EMAIL PROTECTED]> writes: > I thought this would be really handy when parsing from a continuous buffer > like a MemBufInputSource or a LocalFileInputSource. I have a situation > where I SAX parse _very_ large XML instances looking for small repeating > fragments. These fragments are operated on individually by making a DOM to > operating on those nodes in all sorts of application defined ways. > > If I had the functionality described by Ted, I could SAX the file and save > off the starting and ending offsets into the large document. Post that info > to a thread pool to process the fragments asynchronously. In fact, I can > use my Win32 memory mapped file input source to SAX the original large file > and serve as a source to the DOM parser during the per work item processing. > The way I'm doing it now involved _way_ too many buffer copies to be really > fast - but it could be.
Hey Jim, I agree. For the MAGE object model we're going to be routinely parsing big chunks of scientific data, maybe 0.5 Gb => 2.0 Gb, and looking for certain pieces of the data. I'd like to be able to do lazy parsing, and just store the byte offsets to the bits that I want. There *has* to be some easy modification that we can make to subclass InputSource or XMLScanner to get this working. I don't know enough about the internals of how the scanner works, but if someone can clue me in a bit, I'd be happy to implement this. jas. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
