Re: [XML-SIG] SAX characters() output on multiple lines for non-ascii

Fred Drake Sat, 02 Feb 2008 19:04:25 -0800

On Feb 2, 2008, at 6:04 PM, woodcock wrote:
> I am starting with SAX and am trying to parse a file that contains  
> non-ascii
> characters.  The xml file uses 'ISO-8859-1'.  When it parses text  
> containing
> non-ascii characters the output is across multiple lines.


This is a fundamental issue with the SAX interface (the interface  
doesn't mandate the splits, but states that they're allowed).  If you  
want something that buffers the text and provides it in larger chunks,  
that could be written as a proxy content handler.

It might be nice if one were provided out of the box, since this is a  
common request, but the basic issue is that some seriously huge  
amounts of data may be enclosed between non-text calls, and one of the  
advantages of SAX is that it doesn't require loading large portions of  
the document into memory if the application doesn't require it.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>




_______________________________________________
XML-SIG maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/xml-sig

Re: [XML-SIG] SAX characters() output on multiple lines for non-ascii

Reply via email to