"The microsoft file formats generally store text as either US-ASCII or
UCS-2. The type of the record/block/etc tells you which it is, so we can
turn that into java (unicode) strings"

Thanks for the input Nick. But one thing is still not clear, can I encode
the text as UTF_8?

When trying to extract non-english text like french, japanese etc, the
output is incomprehensible.

Is there any way encode non-english fonts using POI?


Regards,
Som

On Tue, Sep 8, 2009 at 3:18 PM, Nick Burch <[email protected]> wrote:

> On Tue, 8 Sep 2009, Som Satpathy wrote:
>
>> Does apache POI follow any particular encoding internally while extracting
>> MS office documents? If so what is the encoding that POI uses?
>>
>
> POI is written in Java, so uses native java strings almost everywhere.
> These are unicode
>
> The microsoft file formats generally store text as either US-ASCII or
> UCS-2. The type of the record/block/etc tells you which it is, so we can
> turn that into java (unicode) strings
>
> Nick
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Reply via email to