You can easily read the XML using TCP/IP yourself and find the ending tag, 
process, read the next document, process, etc.  We do that always (much easier 
than other ideas).  You know the ending tag from the starting tag and there are 
issues about blocking and non-blocking reads.  We read one byte blocking and as 
soon as we get something we read until the ending tag and pause for processing. 


From: xml [] On Behalf Of Webb Scales
Sent: Monday, September 09, 2019 9:30 PM
To: Liam R E Quin <>;
Subject: Re: [xml] Recovering from errors in an XML "stream"


I'm OK with making small on-the-fly "edits" to the input (such as removing the 
initial comment, or removing all comments), but trying to make my code discern 
the overall structure (such as picking out the boundaries between the 
documents) is starting to step over into actually parsing it, which defeats the 
purpose of using LibXML2.

If the TextReader didn't insist upon reading beyond the root end-tag, that 
would enable me to solve my problem, I think.  (I don't understand why it does 
that.)  In the absence of any other options, I'm going to experiment with the 
SAX interface and see if that will allow me to stop the parse at the right spot.

Anyway, thanks for your replies, Liam.


On 9/10/19 12:19 AM, Liam R E Quin wrote:

On Mon, 2019-09-09 at 22:41 -0400, Webb Scales wrote:

fact remains that I don't control the text that I'm trying to parse,
and I still need to parse it, even though it's not "well-formed".

You may need to write some form of pre-processor that fixes the
problems. As you say, that may reduce the need for an XML parser.
I haven't investigated error recovery with libxml, so someone else
might have better ideas.



Webb Scales 
Principal Software Architect 
603-673-2306 <> <>  

xml mailing list, project page

Reply via email to