Hi Omar Chiyean schrieb: > Thanks Andreas... > It really worked, i've updated to 0.8.0 and check ExtractText using > the same sintax is working ok. > > Now I have this output of logging information, what is it?? > > 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine > processOperator > INFO: unsupported/disabled operation: cs > 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine > processOperator > INFO: unsupported/disabled operation: CS > 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine > processOperator > INFO: unsupported/disabled operation: sc > 22/10/2009 08:48:06 PM org.apache.pdfbox.util.PDFStreamEngine > processOperator > INFO: unsupported/disabled operation: SC > > > Can I disable it or How do I dissable logging??? Yes. Have a look at [1]. The mentioned logging.properties file is part of the source distribution of pdfbox.
[1] http://markmail.org/message/3wpukybujqsbfna5 > > Thanks in advance... > > 2009/10/22 Andreas Lehmkühler <andr...@lehmi.de> > >> Hi, >> >> Omar Chiyean schrieb: >> > Hi there... >>> I'm new with PDFBox and i'm extracting text from some pdf and letting >> them >>> in a String variable. Now my problem is the latin characters as accentued >>> letter are not suited as they would. >>> >>> How can I set the charset or how can i see the charset returned from the >>> TextStripper from PDFBox?? >>> >>> I read it was UTF-16BE but when i get byte code with this charset and >>> translate it to ISO-8859-1 i get letter separated with a space and no >> luck >>> with accented letters... >>> >>> So whats wrong or can you help me to correct this?? I'm using PDFBOX >> 0.7.3 >> First of all I suggest to update to PDFBox 0.8. It includes a lot of >> improvements and bugfixes. Back to your question. Your are able to >> choose the needed charset before extraction. Have a look at ExtractText >> as an example how to use the text extraction. >> >> BR >> Andreas Lehmkühler BR Andreas Lehmkühler