My XML doesn't get within 100 miles of a DTD. If I care to validate I use
schema. The chunks that I find are very well formed XML due to a priori
knowledge of the xml structure I'm parsing. They look like:
<myRoot>
<info>
</info>
<record>...</record>
<record>...</record>
<record>...</record>
<record>...</record>
<record>...</record>
... 500 MB later ...
<record>...</record>
</myRoot>
I'm shredding this document into n record sub-documents. The faster I can
do that the faster I can get my cluster of <record> processing servers
working.
Jim
> -----Original Message-----
> From: Dean Roddey [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, April 23, 2002 6:50 PM
> To: [EMAIL PROTECTED]
> Subject: Re: RE: how to access the raw text that generated a sax event
>
>
> > I am working on a system that will be responsible for
> > splitting large XML files into record sized chunks.
> > These chunks will be handed off to end-users who
> > want the option of parsing them with whatever parser
> > they choose.
>
> No XML compliant parser should parse such chunks, because
> they are not valid
> XML, to be anal retentive about it. And, if they do have any entity
> references in them, they will be unexpandeable even by those parsers,
> because they will have no clue how to get back to the DTD
> that defined them.
>
> --------------------------
> Dean Roddey
> The Charmed Quark Controller
> Charmed Quark Software
> [EMAIL PROTECTED]
> http://www.charmedquark.com
>
> "If it don't have a control port, don't buy it!"
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]