On Feb 4, 4:09 pm, John Machin <[EMAIL PROTECTED]> wrote: > On Feb 5, 9:02 am, JKPeck <[EMAIL PROTECTED]> wrote: > > > > > On Feb 2, 12:56 am, Jeroen Ruigrok van der Werven <[EMAIL PROTECTED] > > > nomine.org> wrote: > > > -On [20080201 19:06], JKPeck ([EMAIL PROTECTED]) wrote: > > > > >In both of these cases, there are only plain, 7-bit ascii characters > > > >in the xml, and it really is valid utf-16 as far as I can tell. > > > > Did you mean to say that the only characters they used in the UTF-16 > > > encoded > > > file are characters from the Basic Latin Unicode block? > > > It appears that the root cause of this problem is indeed passing a > > Unicode XML string to xml.sax.parseString with an encoding declaration > > in the XML of utf-16. This works with the standard distribution on > > Windows. > > It did NOT work for me with the standard 2.5.1 Windows distribution -- > see the code + output that I posted. > > > It does not work with ActiveState on Windows even though > > both distributions report > > 64K for sys.maxunicode. > > > So I don't know why the results are different, but the problem is > > solved by encoding the Unicode string into utf-16 before passing it to > > the parser.
Interesting. In the course of installing and testing with ActiveState, I upgraded from the standard distribution 2.5.0 to 2.5.1. The former worked; the latter does not (with the original code). So that ..1 seems to matter here, and that probably accounts for why ActiveState raised the exception and the standard 2.5.0 did not. -Jon -- http://mail.python.org/mailman/listinfo/python-list