Re: same question

roddey 5 Apr 2000 17:18:42 -0000

It doesn't read the entire document into memory. It reads 32K chunks at a
time, and it calls each event handler as it parses that event data from the
buffer. This is the same, regardless of which parser is used, since its a
characteristic of the underlying XMLScanner.

----------------------------------------
Dean Roddey
Software Weenie
IBM Center for Java Technology - Silicon Valley
[EMAIL PROTECTED]



Joe Futrelle <[EMAIL PROTECTED]> on 04/04/2000 07:07:02 PM

Please respond to [EMAIL PROTECTED]

To:   [EMAIL PROTECTED]
cc:
Subject:  Re: same question



Gotcha.  OK, one more question: is it legal for a SAX parser to call
handler methods as it reads elements, and not after it has read the
final EOF?  Xerces's current behavior regardless of the input source
is to read the entire document into a memory buffer and then parse the
data in the buffer, calling the handler methods as appropriate.

It seems on first blush that SAX itself doesn't require this behavior,
and that it might be beneficial for codes that process enormous XML
files or long streams to receive SAX events as the document is read
from the file or stream, so that they can opt to retain only as much
character data and parser state as is necessary to perform whatever
processing they're doing at a given point in the XML document.

For instance if I was writing a high-performance XSL processor and it
wanted to run a stylesheet template like

<xsl:template match="foo">
  <bar><xsl:value-of select="."/></bar>
</xsl:template>

on a 1GB XML file, it could certainly in theory do this without
needing to fill a 1GB memory buffer with the entire document.  This is
of course a trivial case but there are classes of processing tasks
that can be treated this way.

On Tue, Apr 04, 2000 at 03:28:34PM -0600, [EMAIL PROTECTED] wrote:
> No, there isn't. Xerces is an XML parser, which means it parses XML as
> defined in the XML spec. What you want to do is not XML as the spec
defines
> it. You will have to break the data into legal documents and feed them to
> the parser.
>
> The only other option is that you effectively feed it an endless
document,
> i.e. you send up to the root element, then you start sending 'data
packets'
> which are elements that are children of that root element, each packet of
> which is one of the bundles of info you care about. When you are ready to
> drop the link, send the end tag for the root, and drop the connection, to
> end the parse.
>
> ----------------------------------------
> Dean Roddey
> Software Weenie
> IBM Center for Java Technology - Silicon Valley
> [EMAIL PROTECTED]
>
>
>
> Joe Futrelle <[EMAIL PROTECTED]> on 04/04/2000 01:12:38 PM
>
> Please respond to [EMAIL PROTECTED]
>
> To:   [EMAIL PROTECTED]
> cc:
> Subject:  same question
>
>
>
> I've looked further into the Xerces-C code and it appears that both
> the SAX and DOM parsers use XMLReader to get character data from the
> input source, and that XMLReader assumes that it can read until EOF.
>
> What I'd like to do then seems impossible; which is to read top-level
> elements from a socket, one at a time, taking action on them in
> realtime as they're read.  I want the parser to tell me when to stop
> reading one element and expect the next; I don't want to have to
> develop a redundant protocol to wrap the XML in.  Is there some
> workaround to get this behavior?
>
> ------------------------------------------------------------------------
> Joe Futrelle                  | "The copperhead could have been
> Team Leader, Emerge           | delivering dance-hall cheats, with you,
> Scientific Data Tech. / NCSA  | Peter shouldn't have been apprising
> http://emerge.ncsa.uiuc.edu/  | Leyland." -- J. Philippa Warwickshire
>
>
------------------------------------------------------------------------
Joe Futrelle                  | "When isn't a hiccoughing sorcery
Team Leader, Emerge           | oftener theirs?" -- Prof. Sara Zurich
Scientific Data Tech. / NCSA  |
http://emerge.ncsa.uiuc.edu/  |
Re: same question

Reply via email to