Thanks Andreas... It really worked, i've updated to 0.8.0 and check ExtractText using the same sintax is working ok.
Now I have this output of logging information, what is it?? 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: cs 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: CS 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: sc 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine processOperator INFO: unsupported/disabled operation: SC Can I disable it or How do I dissable logging??? Thanks in advance... 2009/10/22 Andreas Lehmkühler <andr...@lehmi.de> > Hi, > > Omar Chiyean schrieb: > > Hi there... > > I'm new with PDFBox and i'm extracting text from some pdf and letting > them > > in a String variable. Now my problem is the latin characters as accentued > > letter are not suited as they would. > > > > How can I set the charset or how can i see the charset returned from the > > TextStripper from PDFBox?? > > > > I read it was UTF-16BE but when i get byte code with this charset and > > translate it to ISO-8859-1 i get letter separated with a space and no > luck > > with accented letters... > > > > So whats wrong or can you help me to correct this?? I'm using PDFBOX > 0.7.3 > First of all I suggest to update to PDFBox 0.8. It includes a lot of > improvements and bugfixes. Back to your question. Your are able to > choose the needed charset before extraction. Have a look at ExtractText > as an example how to use the text extraction. > > BR > Andreas Lehmkühler >