My XML doesn't get within 100 miles of a DTD.  If I care to validate I use
schema.  The chunks that I find are very well formed XML due to a priori
knowledge of the xml structure I'm parsing.  They look like:

<myRoot>
        <info>
        </info>
        <record>...</record>
        <record>...</record>
        <record>...</record>
        <record>...</record>
        <record>...</record>
        ... 500 MB later ...
        <record>...</record>
</myRoot>


I'm shredding this document into n record sub-documents.  The faster I can
do that the faster I can get my cluster of <record> processing servers
working.

Jim


> -----Original Message-----
> From: Dean Roddey [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, April 23, 2002 6:50 PM
> To: [EMAIL PROTECTED]
> Subject: Re: RE: how to access the raw text that generated a sax event
> 
> 
> > I am working on a system that will be responsible for
> > splitting large XML files into record sized chunks.
> >  These chunks will be handed off to end-users who
> > want the option of parsing them with whatever parser
> > they choose.
> 
> No XML compliant parser should parse such chunks, because 
> they are not valid
> XML, to be anal retentive about it. And, if they do have any entity
> references in them, they will be unexpandeable even by those parsers,
> because they will have no clue how to get back to the DTD 
> that defined them.
> 
> --------------------------
> Dean Roddey
> The Charmed Quark Controller
> Charmed Quark Software
> [EMAIL PROTECTED]
> http://www.charmedquark.com
> 
> "If it don't have a control port, don't buy it!"
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to