Re: problems parsing an xls spreadsheet

Nick Burch Tue, 21 Dec 2010 02:28:52 -0800

On Tue, 21 Dec 2010, Shaun Cutts wrote:

ok, but in when I call parse, then my ContentHandler.characters()callback gets a char [], and this is passed as:
(Pdb) ch
array('c', '\xa9 2010 Crane Data LLC. All rights reserved.')

so when I try unicode I get an error:

(Pdb) ch.tounicode()
*** ValueError: tounicode() may only be called on type 'u' arrays

You sure there isn't a problem with your python-java bridge? All Javastrings are always unicode

So it would seem to me that in fact I'm not getting a unicode stringhere. When I try to decode in various codecs, I get problems. Onequestion is what is the standard name for "UCS-2" -- as when I try touse that name it fails; is it a subset of utf-16?

UCS-2 is a predecessor to UTF-16, which doesn't handle supplementary codepoints so can't hold the whole of the unicode range.

http://en.wikipedia.org/wiki/UTF-16/UCS-2

Nick

Re: problems parsing an xls spreadsheet

Reply via email to