From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]] > Sent: Thursday, May 02, 2002 3:20 PM > To: '[EMAIL PROTECTED]' > Subject: Re: Question about charset handling in HSSF > > > On Thu, 2002-05-02 at 17:49, Carey Sublette wrote: > > I have a question - how does hssf handle the conversion of the windows 1252 > > charset (the Microsoft 'customized' version of ISO Latin-1) to Unicode > > strings? > > > > I just did a test with reading an xls format spread sheet that had a cell > > with character values from 0x80 to 0x9f, all them non-printing characters in > > Latin-1/Unicode, but with printable symbols assigned (at least to most of > > them) in Windows 1252. > > > > When I wrote the characters out to a file using FileOutputStream I found > > that most of them were being written as 0x3f ("?"), with a few of them > > having their original values (0x81, 0x8d, 0x8f, 0x90, 0x9d). > > > > Is there 1252->Unicode encoding conversion being done (I am running on > > Unix)? > > > > You know I don't think we're doing a lot with them yet.... To see what > we're doing look here: > > http://cvs.apache.org/viewcvs/jakarta-poi/src/java/org/apache/poi/util/Strin gUtil.java?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup > > Do a google search for demoronise.pl a/o demoronize.pl - You'll find a > perl algorythm to *handle* this. If you can perhaps create an api > switch or two to turn it on and off that would be nice along with > patches.
Yes, I am familiar with charset mapping approaches and Perl scripts for handling this. I'll work on something to permit control over encodings for 1252, Latin-1, and Unicode. Carey Sublette
