On Tue, 8 Sep 2009, Som Satpathy wrote:
Does apache POI follow any particular encoding internally while extracting MS office documents? If so what is the encoding that POI uses?

POI is written in Java, so uses native java strings almost everywhere. These are unicode

The microsoft file formats generally store text as either US-ASCII or UCS-2. The type of the record/block/etc tells you which it is, so we can turn that into java (unicode) strings

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to