RE: Question about charset handling in HSSF

Carey Sublette Fri, 03 May 2002 06:54:00 -0700

From: Andrew C. Oliver [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, May 02, 2002 3:20 PM
> To: '[EMAIL PROTECTED]'
> Subject: Re: Question about charset handling in HSSF
> 
> 
> On Thu, 2002-05-02 at 17:49, Carey Sublette wrote:
> > I have a question - how does hssf handle the conversion of the windows
1252
> > charset (the Microsoft 'customized' version of ISO Latin-1) to Unicode
> > strings? 
> > 
> > I just did a test with reading an xls format spread sheet that had a
cell
> > with character values from 0x80 to 0x9f, all them non-printing
characters in
> > Latin-1/Unicode, but with printable symbols assigned (at least to most
of
> > them) in Windows 1252.
> > 
> > When I wrote the characters out to a file using FileOutputStream I found
> > that most of them were being written as 0x3f ("?"), with a few of them
> > having their original values (0x81, 0x8d, 0x8f, 0x90, 0x9d).
> > 
> > Is there 1252->Unicode encoding conversion being done (I am running on
> > Unix)?
> > 
> 
> You know I don't think we're doing a lot with them yet....  To see what
> we're doing look here:
> 
>
http://cvs.apache.org/viewcvs/jakarta-poi/src/java/org/apache/poi/util/Strin
gUtil.java?rev=1.1.1.1&content-type=text/vnd.viewcvs-markup
> 
> Do a google search for demoronise.pl a/o demoronize.pl - You'll find a
> perl algorythm to *handle* this.  If you can perhaps create an api
> switch or two to turn it on and off that would be nice along with
> patches.


Yes, I am familiar with charset mapping approaches and Perl scripts for
handling this. 

I'll work on something to permit control over encodings for 1252, Latin-1,
and Unicode.

Carey Sublette

RE: Question about charset handling in HSSF

Reply via email to