We should be universally handling the issues mentioned here: http://en.wikipedia.org/wiki/Windows-1252 by intercepting character differences and writing them out properly. Thus HSSF should force 8859-1 encoding but should then kind of do a replace on the characters. If someone wants to contribute I can point them in the right direction.

-andy

Christian Gosch wrote:
Hi,

that would be of particular interest for me, too.

We have some international names in our application, although it runs in a
ISO-Latin-1 (ISO-8859-1) [db, appserver] / Cp1252 [client] environment with
deDE locale by default.

We have several areas of "visibility" like DB (VarChar fields), Java source
files, appserver console, JSP source / rendering / display, PDF and XLS
download.

Actually we use the last POI final (should be 2.5.1?), and I do not remember
any possibility of setting the encoding for String values in a sheet. Since
the XLS file format is kind of a "hybrid" one, mixed up from binary
structure / control data and textual content data, it is crucial to fill in
all textual "content" with the appropriate encoding -- and that one should
be subject to set up / choose.

Testing some examples I found that
- very most characters found in our data are displayed as they should, in
JSP and XLS (by POI).
- the czech "s with v on top" is displayed well in JSPs, but not in POI
generated XLS: There it shows up as "little rectangle".
I know that in ISO-8859-1 there are also problems with danish "o with slash"
also, but currently I have no test data. Also I would expect problems with
turkish letters like "i without dot" or "c with bottom accent", like in the
city name "Incirlik", when written correctly.

btw:
In JXL (JExcelAPI) it is posible to set up an encoding for a generated XLS
file, which by default is "the default encoding of the hosting VM", but it
took a while to make that happen.


Regards
Christian Gosch
inovex GmbH



On Tuesday, November 08, 2005 11:59 PM [GMT+1=CET],
Olivier Matt <[EMAIL PROTECTED]> wrote:


Hello,

I'm reading excel files and I get from a CELL_TYPE_STRING cell a
String.

That string has some problems with accents (I guess the file is
encoded using
some latin-characters encoding), they are not seen properly.

How can I avoid this behavior ? Can I specify somewhere the encoding
of the cells ?
Or is there a method for transforming misinterpreted strings to good
latin-strings ?


Thanks for help,

Olivier

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/




--
Andrew C. Oliver
SuperLink Software, Inc.

Java to Excel using POI
http://www.superlinksoftware.com/services/poi
Commercial support including features added/implemented, bugs fixed.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Reply via email to