Hi,

Omar Chiyean schrieb:
> Hi there...
> I'm new with PDFBox and i'm extracting text from some pdf and letting them
> in a String variable. Now my problem is the latin characters as accentued
> letter are not suited as they would.
> 
> How can I set the charset or how can i see the charset returned from the
> TextStripper from PDFBox??
> 
> I read it was UTF-16BE but when i get byte code with this charset and
> translate it to ISO-8859-1 i get letter separated with a space and no luck
> with accented letters...
> 
> So whats wrong or can you help me to correct this?? I'm using PDFBOX 0.7.3
First of all I suggest to update to PDFBox 0.8. It includes a lot of
improvements and bugfixes. Back to your question. Your are able to
choose the needed charset before extraction. Have a look at ExtractText
as an example how to use the text extraction.

BR
Andreas Lehmkühler

Reply via email to