Re: [poi] Problem with encoding

acoliver Thu, 10 Nov 2005 05:18:47 -0800

Christian Gosch wrote:

However, there are other Windows localizations over the world which don't
use Cp1252 -- to go to the extremes, look at asian versions supporting
Traditional Chinese, Japanese or the like. Even russian, korean, hebrew are
sold, and they all have completely different charsets -- and the interesting
thing about it is, that I can view / edit files created there on other
localizations! So there *must* be *some* way to (a) encode it and (b) tell,
what the actual encoding is.

You're actually out of luck with any right-left language ATM unlesssomeone takes the time to decode the "special" "undocumented" bitsnecessary to run them. Also there are special "far east" bits in thefileformat that we preserve but have not reverse engineered. The OO.oguys also seem to have not yet done this (these were never documentedanywhere and you'd need to know Chinese or Japanese to really do this work).

Look at the implementation of JXL. It is not my work, but we used it a lot
before POI. In the first versions, there was no idea of supporting something
like non-US chars, but after some weeks of discussion the developer of JXL
got the message, and *did* implement support of different encodings. Since
this product is open source, it should be possible. Look for JExcepAPI
(http://www.andykhan.com/jexcelapi/).

We should not and can not look at it. Its license means we might beproducing a derivative work and encumbering POI. (They are not bound bysuch a problem due to the ASL license being so permissive) Fortuantely,we're pretty good at figuring things out ourselves.

Please, don't let all of us non-US developers in 'good old europe' and
wherever else not starvate by lack of custom encoding support...

Generally speaking insulting people tends to make them stop helping you.POI was used on a widespread basis first in Germany due to itscoverage in the German tech media long before it was used in the US.There are only three committers from the US on the project. Most of thecommitters from other parts of the world.


Tsch??s,

-Andy

Regards
Christian Gosch
inovex GmbH


On Wednesday, November 09, 2005 11:07 PM [GMT+1=CET],
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

Excel wants cp1252 for most things...  It just does.  When I get home
(I'm on the road) I'll look at the dev kit...it may be that by
changing the codepage record we can handle things a bit nicer, but
eeez kinda picky about that and regardless of what AIX may support,
when
you open the Excel sheet it will be on Windows generally (or a
semi-emulation of it on Mac/Linux) and you'll have to write it in an
encoding supported by Excel for Windows...

-Andy

Rainer Klute wrote:

Am Mittwoch, den 09.11.2005, 07:25 -0500 schrieb [EMAIL PROTECTED]:

We should be universally handling the issues mentioned here:
http://en.wikipedia.org/wiki/Windows-1252 by intercepting character
differences and writing them out properly.  Thus HSSF should force
8859-1 encoding but should then kind of do a replace on the
characters. If someone wants to contribute I can point them in the
right direction.


Um, no. Enforcing ISO 8859-1 as character code would be of limited
use only. These reason is that like Windows Codepage 1252 it
represents only a limited set of characters. UTF-8 is the preferred
character encoding. However, POI should not forbid to create strings
in other character encodings, be it ISO 8859-1, cp1252 or whatever.

By the way, HPSF does a nice job of supporting a lot of different
character encodings. At least there are no problems I am aware of. I
suggest you have a look at it.

Best regards
Rainer Klute

                         Rainer Klute IT-Consulting GmbH
Dipl.-Inform.
Rainer Klute             E-Mail:  [EMAIL PROTECTED]
K??rner Grund 24          Telefon: +49 172 2324824
D-44143 Dortmund           Telefax: +49 231 5349423

Public key fingerprint: E4E4386515EE0BED5C162FBB5343461584B5A42E



Gruesse,



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
Mailing List:     http://jakarta.apache.org/site/mail2.html#poi
The Apache Jakarta Poi Project:  http://jakarta.apache.org/poi/

Re: [poi] Problem with encoding

Reply via email to