On Feb 2, 2008, at 6:04 PM, woodcock wrote: > I am starting with SAX and am trying to parse a file that contains > non-ascii > characters. The xml file uses 'ISO-8859-1'. When it parses text > containing > non-ascii characters the output is across multiple lines.
This is a fundamental issue with the SAX interface (the interface doesn't mandate the splits, but states that they're allowed). If you want something that buffers the text and provides it in larger chunks, that could be written as a proxy content handler. It might be nice if one were provided out of the box, since this is a common request, but the basic issue is that some seriously huge amounts of data may be enclosed between non-text calls, and one of the advantages of SAX is that it doesn't require loading large portions of the document into memory if the application doesn't require it. -Fred -- Fred Drake <fdrake at acm.org> _______________________________________________ XML-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/xml-sig
